Author Topic: Server crashed -Hardware Failure.... need advice  (Read 5165 times)

half_life

  • Bug Hunter
  • Zen Hero
  • *****
  • Posts: 867
  • Karma: +59/-0
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #15 on: February 05, 2012, 06:09:22 pm »
KVM allows direct assignment of disks to virtual machines.  While I haven't tried this,  it should allow you to assign /dev/md0 to a virtual machine drive (probably type virtio).  Proxmox doesn't allow for the possibility via the gui but allows these things via direct edit of the vm config file.

stuartiannaylor

  • Guest
Re: Server crashed -Hardware Failure.... need advice
« Reply #16 on: February 05, 2012, 07:16:27 pm »
http://mcelog.org/faq.html

I get "kernel hardware error no human readable mce decoding support on this cpu type" This is pretty much a bug in newer Linux kernels. They print this message on every corrected error, even though it's useless and also the decoding into the kernel log is not very useful because mcelog can aggregate the information much better. This is fixed with this patch   To apply to a kernel: download raw patch , cd kernel source, patch -p1 < patchfile, recompile.





vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #17 on: February 05, 2012, 09:08:42 pm »
Half_life what do you think about the following scenario.  Since with the new hardware and once my motherboard gets warrantied out from the old hardware.... I will have the potential to make two servers.  Here are some details below: (raid 6 disks are 5900 RPM and all other mechanical disks 7200 RPM if it matters)

New machine:  Dual xenon 5606  on an Asus board with 24 gigs of ram.  Has 4 Lan ports and has SAS/SATA card (has two mini SAS ports good for 8 drives).  I was thinking loading it with two 500 gig drives(make it hardware raid1 this time).  Also keep the 7 X 2TB drives in it (try to get my software raid 6 back up and running).  I also have a 32 gig SSD I was going to use for swap.

The old machine: AMD 1090T hexacore on an MSI motherboard with 16 gigs of ram.  I also have a 32 gig SSD I was going to use for swap.  I was thinking of loading it with two 750 gig drives in hardware raid 1.

For me it is important to have good Samba performance between local and VPN clients.  So I was thinking of keeping the zentyal file sharing module (make it a slave) on the direct hardware of the new machine.... this way I have the best performance possible for samba.  On this same system I was going to install virtualbox (Maybe KVM is better, but I don't know how to use it).  Inside virtualbox I will bring up my old system so that I have access to any data I might need.

Now once the second server is up and running I was going to use either proxmox or xen .... which ever one would give me the best performance results. 

Now since I have two servers which are able to run virtual guests.... I think I could create zentyal in the following way:
1) Master LDAP or go with Windows AD ( not sure which is better)
2) Run an instance of Zentyal as just a gateway.... perhaps with a few other modules running as well.
3) Have an instance of zentyal running in which I install zarafa ( this way I can use it as a mail server) + maybe recreate my main webserver on this instance. 

I figure if I make all the virtual machines in Virtualbox and maybe I can convert them into a format either xenserver or proxmox can accept ( this is my big question.... since these are completely different hypervisors would this work).
Maybe there is a better way to do this..... I am open to all sorts of scenarios....   

half_life

  • Bug Hunter
  • Zen Hero
  • *****
  • Posts: 867
  • Karma: +59/-0
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #18 on: February 05, 2012, 11:30:14 pm »
Half_life what do you think about the following scenario.  Since with the new hardware and once my motherboard gets warrantied out from the old hardware.... I will have the potential to make two servers.  Here are some details below: (raid 6 disks are 5900 RPM and all other mechanical disks 7200 RPM if it matters)

It is a best practice to keep hard drives like to like in an array.

New machine:  Dual xenon 5606  on an Asus board with 24 gigs of ram.  Has 4 Lan ports and has SAS/SATA card (has two mini SAS ports good for 8 drives).  I was thinking loading it with two 500 gig drives(make it hardware raid1 this time).  Also keep the 7 X 2TB drives in it (try to get my software raid 6 back up and running).  I also have a 32 gig SSD I was going to use for swap.

LSI or Highpoint? Keep in mind that you will not be able to add the 2TB drives to the hardware raid until you have retrieved your data.  Wouldn't the SSD drive serve better as the host OS drive?

The old machine: AMD 1090T hexacore on an MSI motherboard with 16 gigs of ram.  I also have a 32 gig SSD I was going to use for swap.  I was thinking of loading it with two 750 gig drives in hardware raid 1.

Here again I think that the SSD would better suit the OS since you probably won't impact your swap given your available ram.

For me it is important to have good Samba performance between local and VPN clients.  So I was thinking of keeping the zentyal file sharing module (make it a slave) on the direct hardware of the new machine.... this way I have the best performance possible for samba.  On this same system I was going to install virtualbox (Maybe KVM is better, but I don't know how to use it).  Inside virtualbox I will bring up my old system so that I have access to any data I might need.

Samba via VPN?  I think that I have missed that bit of information in the past when you were talking about throughput issues.  VPN queues things up to be transmitted serially over a one port connection.  It also adds overhead along the way. This would be the source of your bottleneck.  Otherwise performance can be improved with faster disk access times and throughput as you were thinking.  Virtualbox is easier to setup but not as fast as you have discovered. 

Now once the second server is up and running I was going to use either proxmox or xen .... which ever one would give me the best performance results. 


Given your use case,  I would put my efforts towards ease of use.  Performance between Xen and KVM in the fully virtualized environment give competitive numbers.  The tools to maintain the environment is where the difference is in my opinion.

Now since I have two servers which are able to run virtual guests.... I think I could create zentyal in the following way:
1) Master LDAP or go with Windows AD ( not sure which is better)
2) Run an instance of Zentyal as just a gateway.... perhaps with a few other modules running as well.
3) Have an instance of zentyal running in which I install zarafa ( this way I can use it as a mail server) + maybe recreate my main webserver on this instance. 

I figure if I make all the virtual machines in Virtualbox and maybe I can convert them into a format either xenserver or proxmox can accept ( this is my big question.... since these are completely different hypervisors would this work).
Maybe there is a better way to do this..... I am open to all sorts of scenarios....

It isn't frivolous but by the same token it is not hard to move a virtualbox machine to one of the others.  Xen and KVM are easy to move back and forth between them.  Maybe it would be better to start with what work you are trying to achieve.  For instance there might be a better way to approach things concerning your Samba over VPN issue if we knew more about what business problem you were trying to solve.
« Last Edit: February 05, 2012, 11:31:57 pm by half_life »

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #19 on: February 06, 2012, 12:10:09 am »
sorry Half_life .... I combined a couple of things and now I see how I my statements don't make sense.

1) My harddrives  (raid6 is 7 disks all 5900RPM software raid)....  my os currently sits on 2 500 gig disks 7200RPM in software raid 1.   I have two more 750 gig drives which are 7200RPM I was going to use for server two.  So each raid set is sitting on exact same harddrives.

2) I was thinking of using the 2 SSD, but they are very small  (32gig) and also from what I understand the linux kernel does not support trim.  So that is why I was just thinking of using them for swap. ( I am definitely open to other possibilities)

3)the raid card is an intel one.. (it is cheap so I am sure it just dumps the work over to the CPU)  I was just going to use its raid features in order to raid the two 500 gig drives together (make a hardware raid instead of software raid).  Once the OS is installed I was just going to plug in all the 2TB drives and see if I can recompile the raid6 with mdadm.

4) Samba..... sorry I meant to say all LAN clients. (virtualbox was giving me much slower performance for virtual samba shares) If KVM can fix that problem and I can mount my physical software raid6 into a virtual machine that would be great.



From what you are saying if I have to redo my server I should just use KVM this way it would be easy to transfer virtual guests between servers.  If one goes down I can use the next.

My biggest thing I want to bring up my email server and gateway as quickly as possible if another hardware failure happens.  It does not have to be automatic, but it would be nice it was just a simple migrating of a VM from one server to the next ... (plus just changing some physical network cables)

half_life

  • Bug Hunter
  • Zen Hero
  • *****
  • Posts: 867
  • Karma: +59/-0
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #20 on: February 06, 2012, 04:49:27 am »
sorry Half_life .... I combined a couple of things and now I see how I my statements don't make sense.

1) My harddrives  (raid6 is 7 disks all 5900RPM software raid)....  my os currently sits on 2 500 gig disks 7200RPM in software raid 1.   I have two more 750 gig drives which are 7200RPM I was gareoing to use for seLinux does support trim (discard)rver two.  So each raid set is sitting on exact same harddrives.

Sorry,  I misunderstood.

2) I was thinking of using the 2 SSD, but they are very smareall  (32gig) and also from what I understand the linux kernel does not support trim.  So that is why I was just thinking of using them for swap. ( I am definitely open to other possibilities)

Linux does support trim (discard)

From Wikipedia in pertinent part:

----------------------------------------------------------------------------------
Discard parameter in Linux
Although TRIM is supported in the Linux kernel since version 2.6.33, the operating system does not automatically enable TRIM operation. The user must modify the appropriate /etc/fstab file to add the word discard in the appropriate SSD entries. Without this user modification the Linux operating system will not pass the TRIM command to the SSD.[42]
[edit]Enabling unsupported operating systems
Where TRIM is not automatically supported by the operating system, there are utilities which can send TRIM commands manually. Usually they list all free blocks as specified by the operating system and then pass this list as a series of TRIM commands to the drive. These utilities are available from various manufacturers (Intel,[14] G.Skill[43]) or as general utilities (hdparm since v9.17[44][45]).


Discard parameter in Linux
Although TRIM is supported in the Linux kernel since version 2.6.33, the operating system does not automatically enable TRIM operation. The user must modify the appropriate /etc/fstab file to add the word discard in the appropriate SSD entries. Without this user modification the Linux operating system will not pass the TRIM command to the SSD.[42]
[edit]Enabling un
supported operating systems
Where TRIM is not automatically supported by the operating system, there are utilities which can send TRIM commands manually. Usually they list all free blocks as specified by the operating system and then pass this list as a series of TRIM commands to the drive. These utilities ar
e available from various manufacturers (Intel,[14] G.Skill[43]) or as general utilities (hdparm since v9.17[44][45]).
-------------------------------------------------------------------


3)the raid card is an intel one.. (it is cheap so I am sure it just dumps the work over to the CPU)  I was just going to use its raid features in order to raid the two 500 gig drives together (make a hardware raid instead of software raid).  Once the OS is installed I was just going to plug in all the 2TB drives and see if I can recompile the raid6 with mdadm.

That will work.

4) Samba..... sorry I meant to say all LAN clients. (virtualbox was giving me much slower performance for virtual samba shares) If KVM can fix that problem and I can mount my physical software raid6 into a virtual machine that would be great.

Xen and KVM will both improve that particular problem.  Tweeks to get the most performance out of disk I/O would include using virtio device type (disk and network card) and using LVM to store the virtual machine rather than a container file.  LVM makes it a little less simple to relocate VM's but still not too bad  (DRBD makes this a snap in a HA setup).  Of course starting with a solid disk I/O system on the real hardware really helps too(hardware raid with cache).


From what you are saying if I have to redo my server I should just use KVM this way it would be easy to transfer virtual guests between servers.  If one goes down I can use the next.

Either would do.  There are strengths in either path.  I would lean towards kvm over xen because that is the path the distributions are going.  Proxmox gives a neat and tidy interface and has a straight forward path to HA without too much fuss (the 2.0 beta has this and there will be an upgrade in place option when it goes gold).  To clear up what I am saying it is easy to use VM's back and forth between Xen and KVM (this would include Proxmox)

My biggest thing I want to bring up my email server and gateway as quickly as possible if another hardware failure happens.  It does not have to be automatic, but it would be nice it was just a simple migrating of a VM from one server to the next ... (plus just changing some physical network cables)

No problem to set things up for a fail over between machines.  No need to rearrange cables.  If you play your cards right,  the virtual machine is kept parallel between the servers.

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #21 on: February 06, 2012, 05:33:21 am »
Thank you for help !!!!

I finally have made it home .... gone for the weekend and had a chance to take out the motherboard.  Found a burnt out piece right away.  Now have to have MSI replace it !!!!

 - If the system does not come up by simply replacing hardware.....  I will try to proxmox direction.   

- going to use mechanical disk.... I think SSD it would cost me to much to get some more SSD drives (espcially big enough for my needs)  Plus I already have the mechanic disks...

- I am not very familiar with LVM so this will be a new thing + I have tried to google how to enable my software raid6 and give direct access to a VM (so far not entirely sure)

- Finally your statement about synchronizing the servers.... I thought this would only be possible if you had a NAS or SAN .... this way you could do live migration (I definitely do not know much about this)

I am super looking forward to trying this project out.... see if I can make it work like I want.   

THANK YOU !!!!

half_life

  • Bug Hunter
  • Zen Hero
  • *****
  • Posts: 867
  • Karma: +59/-0
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #22 on: February 06, 2012, 07:51:09 am »
You are welcome.  I need to caution you on one thing.  RAID does not replace a backup strategy.  It only gives you defense in depth.  If you lose a disk,  the RAID system should survive.  If you lose more than that or software decided to play scrabble with your data,  the chances of getting your data back go down.  I do 100% backup of all VM's every night as insurance.  In other words: dd if=/dev/VM/my_vm_lvm_snapshot  of=/media/backup_drive bs=1M.  My two big servers both happen to die at the same time, no problem.  Pull the drives and round up a bunch of smaller machines.  Install Ubuntu server with KVM and grab the xml files conveniently already placed on the backup drives.  2 hours or so and I can be back up.   The SSD drive would only be for the OS so I don't know about you but mine only takes up about 4-6gig typically.  Software RAID has a /dev/md?  device that you would just pass on to the VM.  I will let you know if that definitely works.  I am getting ready to re-work the home server and it would be safe to play around with this for me.  I just did a little test on my desktop system.  I passed through my software raid array to a Ubuntu LTS VM running under KVM.  It showed me the LVM partitions on it so that is an encouraging sign.  I am going to test it more formally before deciding if it works or not.  DRBD is an interesting technology to say the least.

christian

  • Guest
Re: Server crashed -Hardware Failure.... need advice
« Reply #23 on: February 06, 2012, 08:11:43 am »
Very few (meaning here "zero"  ;D ) chance that MSI replaces components on you mobo. Best case they will replace mobo itself.
I may have missed some parts of this thread but I'm getting confused feeling that you try to achieve 2 different goals at same time:
- repair your server without the burden of handling migration, copy or whatever else of TB of data
- defining better fault tolerant implementation

My advice would be that you split it into two different threads:
- one to relaunch services asap (this should be easily done by replacing your mobo with no change at disk level)
- one to improve design

I've a couple of comment/question:
- did you look at swapping figures? I would be surprised that such huge server with 16GB of memory shows any swap activity especially if used to run Zentyal only. This could be different if you were running application server with lot of java based sessions but infrastructure services + Samba...  ??? How is your swappiness parameter tuned?
- I would rather dedicate SSD to system than swap, moving /log elsewhere
- I hope you realize, thank (kind of) to this hardware issue, that:
    - LVM "alone" is useless in case of hardware failure (except that impact is wider in case you have multiple machines running on same hardware)
    - RTO is sometimes different for internet & mail vs. Samba: running everything on same server is not always a good idea

Moving to HA aspects (sorry for this long post), I would like to clarify some concepts or at least to explain how I perceive it:
- LVM is an efficient way of providing fault tolerance if VM files are not stored on one server only.
- SAN or NAS can be used to achieve this. DRBD  is another approach. There is however significant difference between these designs:
   - SAN (and DRBD) works at block level while NAS works at file level. This means that data on NAS can be accessed from different servers at same time while SAN allocates data to one server only. This impact the way you swing from one server to another in case of failure.
   - DRBD can have noticeable impact in term of performance depending on amount of data to be synchronized.
   - If you decide to go for DRDB without losing Raid6 performance impact, you will have to build another dedicated file system, meaning more disks.

Because of the above, I would investigate something based on:
    - NAS for your data (fault tolerant disk, in case of mobo failure, replace it if you can afford this RTO)
    - LVM based on either SAN or better DRBD for infrastructure services so that you keep a live copy of you VM and restart quickly on available hardware in case of failure.

Unfortunately this has a cost  ::)

christian

  • Guest
Re: Server crashed -Hardware Failure.... need advice
« Reply #24 on: February 06, 2012, 08:21:40 am »
DRBD is an interesting technology to say the least.

Indeed  ;)
This is, now at OS level, what you have since years with remote snychro feature on Netapp or (kind of) EMC SRDF.
Very useful if used at the right place for the right purpose but nothing magic  :-[  Well, I writing this based on what I understand of DRDB. I never used it so far  :-[
I'm very curious about performance impact on file servers, e.g.

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #25 on: February 06, 2012, 01:10:44 pm »
Thank you.....   

Yes Christian... I have kind of went a different direction in this thread.  Mostly I am not sure if my system will just start backup if I replace the hardware with either the same mobo or just migrate the disks to a new server....   It will probably be a little bit before MSI replaces my mobo (if they do)  I don't think I can wait this long and will probably try the new hardware. So if the system does not boot backup I will have to quickly try to rebuild it (either going back with the same setup as before or a different one).... I use the zentyal backup tool to make a nightly backup, but I have no idea how to bring the system backup using that tool (I only know how to use it to retrieve files I or a client might have erased accidentally)

Half_life.... I completely understand that there is no way to 100% guarantee that a system won't crash or both systems won't crash.  This is why I was doing FTP backup using filesync from a client windows7 PC.  Just to give me a little more piece of mind.  It paid off because all the data is still accessible and I am able to continue doing work without issues.  I also understand the importance of taking snapshots if you run a VM (this is why the VM approach is very interesting to me). 

- Now about the SSD ..... the only reason I was thinking of adding them is that I have two laying around.  I am not sure if I will do this.. mostly because I only have two and I like doing system disks as raid encase of failure.  Maybe on once server I will add both disk.  (proper partitioning is a weak point of mine.... I am never sure of how to partition the system or what file system to use.) 

Having an SSD + 2 X 500 gig drives ...... how would you partition and what file system would you use.


christian

  • Guest
Re: Server crashed -Hardware Failure.... need advice
« Reply #26 on: February 06, 2012, 01:48:32 pm »
Having an SSD + 2 X 500 gig drives ...... how would you partition and what file system would you use.

Ext4 + /var and swap on HDD instead of SSD  ;)

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #27 on: February 06, 2012, 02:20:02 pm »
What about home???  Also if you go the VM route.... I really don't were to place them.  My SSD is only 32 gigs.... that is really not a hole lot of space.... once you start creating VM especially one that holds the users home accounts and data it will exceed the SSD.

This brings up an interesting question.  On one hand you have a certain partition scheme for the base system..... however on the other hand how would you partition a VM (would it be just one container file..... or would it be done how Half_life said using LVM(totally unfamiliar with LVM)).

christian

  • Guest
Re: Server crashed -Hardware Failure.... need advice
« Reply #28 on: February 06, 2012, 02:52:48 pm »
Basic idea is to move out of SSD everything that is not strictly "system" or  that my generate large amount of write operation. Thus if you have significant use of /home, then move it to your 500GB disks.
With 32GB for system, this is more than enough.

This must obviously be mitigated if disk performance is your main focus.

Regarding LVM: for everything else than system paritions, LVM is a must: it will allow smooth growing and ease storage management a lot.

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #29 on: February 06, 2012, 03:50:28 pm »
Thank you !!!   I really got a lot of help on this topic from Half_life, you and stuartiannaylor.   

I have a much more solid understanding of what I will need to do in order to get a more robust system.

(If I will have to reinstall.... I will just do a fresh install of proxmox and go with a virtual environment)
I think I will use both SSD (create a hardware raid 1 for the system) + create another hardware raid 1 with the 2 x 500 gig drives. (this is for server 1).....   Server 2 will be just hardware raid 2 x 750 gig drives.

Once I create the base system on server 1 I will try to install the mdadm tools and hook my raid6 backup (still not sure how possible this is since proxmox does not support software raid off the bat..... however I think it might work since I am not planning on using the software raid for the OS.  All I need it for is to mount it directly into a VM under the /mnt directory)

I believe using proxmox will allow me to shut down VM when needed and move them from one server to the next.  Also if one server fails completely I should be able to move the vital VM (LDAP, Gateway, Zarafa) over to the working server.