Zentyal Forum, Linux Small Business Server

Zentyal Server => Installation and Upgrades => Topic started by: vshaulsk on February 02, 2012, 02:51:25 pm

Title: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 02, 2012, 02:51:25 pm
Last night  I was saving changes in the zentyal gui (I was actually at the server not remote).

At the same time one of my users was logging in to their windows 7 machine ..... roaming profiles (not sure if this had anything to do with what happened next)

All of a sudden I saw the CPU utilization go up to 100% on all 6 cores.  Maybe about 10 to 20 seconds something in the machine all of a sudden flashed and the hole thing restarted.

Now I can smell a burning or burnt smell......  So I know that something hardware related failed....

The system does turn back on .... gets past the Bios screen ..... goes into the screen where I can select the kernel (2.6.32-38 or whatever) and then all of a sudden spits out the attached screen shot.  I am trying to figure out if the problem is motherboard, cpu, ram or something else so I can replace it.

Also depending on what it is.... I need to know once it is replaced will I be able to boot the system normally or will I have to disaster recover.

- if it is just the CPU .... will simply putting in a new one allow the system to boot ???

- How would you recover if you install a new motherboard ???

I can overnight parts, but I need to figure out what to get and how to recover once I change out the part.

Thank you !!!

Also I guess I am not sure why the system would fail like this.  I have loaded up plenty of other system... I can see them hitting their thermal limit and shutting down... but failing like that ..... not expected.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 02, 2012, 02:53:18 pm
Here is the full print screen
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 02, 2012, 09:22:04 pm
What do I have to do in order to get my system working on another server??

Since the second machine has different hardware (motherboard, ram and cpu).... simply transferring the hard drives would probably not be enough to get the system up.....

What would I have to do in order to restore my system???

Disaster recover???
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: hyerk on February 02, 2012, 11:21:27 pm
The first thing I would do is look at the RAM.  I was trying to install Zentyal on a HP DL380 and it would not complete the install due to bad ram, but would still post and start the install.  Try to download the memtest86+ iso and burn it to a CD and see it it will boot properly.  Also, If you have other modules you can put in place of the current I'd try it.  If not, I'd start removing the modules one by one, or pair by pair depending on you server.  I'd try booting with the minimal amount of ram to see if it would boot properly.  Also, if you say you saw a flash, have you looked inside to see if you can see any physical damage (melted or black marks).

Sorry I'm not that much help for deciphering the error messages, but that's what I'd do to test the hardware.

Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 03, 2012, 02:04:20 am
Well I tried some other ram modules 1 by 1 and still the same problem.  So its not the ram.

Turned off all the extra fans .... took all the side covers off.  Took out all extra PCI cards.  Basically ran it bare minimum.

Definitely smell something burning or melting.   However can not see..... any physical damage ... melted wires or anything.   Also cant see any smoke... so totally not sure.

Basically once I get past the post screen... I see the kernel option screen....  after that I iether get the error message or the screen goes black and looses connection (nothing after that).


So I have decided to get a new motherboard and CPU.   

Thinking of getting dual xenon 5606 quad core processors with either an Asus or SuperMicro motherboard.

I can then have more time to figure out what which part has failed on the old one and make it into a backup server or NAS or something else.


How to recover from this is going to be the tuff question.... since it will be new motherboard and cpu.... what are the steps to get the system up and running???

Do I really have to reinstall everything or can I just recover by using my backups????
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: half_life on February 03, 2012, 02:16:13 am
Given that a failure in the cpu would probably only allow it to function for a short time before complete failure, and the fact that you are smelling it suggest to me that it is a component on the motherboard.  I would give it a wag as a voltage regulator and therefore would suggest not turning it on anymore unless you are prepared to walk away from both suspects.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 03, 2012, 02:39:32 am
Yes I agree with you.  It is under warranty so I will go and have it taken care of.


Any advice on rebuilding the system....

I use the backup function and back everything up to a local drive.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: half_life on February 03, 2012, 02:51:47 am
Aren't you running virtualized?  If so, it would be fairly straight forward to retrieve the machines.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 03, 2012, 03:06:30 am
On my current system I run an instance of zentyal on the metal hardware.  I use virtualbox to run some webservers, zentyal test machines and anything else I need.

With my hardware I think I might use a hypervisor on the bare metal and install from there.  However that leads me to a question.... in order for me to do that would I need to have a SAN's or could I still do it with a local disk setup???

I guess I am not sure if a bare metal hypervisor supports software raid through mdadm.... I don't think it does...... which means I will have to go back to just installing zentyal on the bare metal with virtual box inside it running the VMs

Since my zentyal install is not a VM I am not exactly sure on how to recover it......
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: half_life on February 03, 2012, 06:23:26 am
The question of a SAN comes along with the need for hardware fencing because at that point you are talking about high availability.  A SAN is not necessary for HA.  DRBD will get you what you need on that count.  Proxmox is considered a bare metal hypervisor and in actuality is a debian squeeze setup.  Software raid will work in that environment but depending on your full layout might give less than stellar results.  For instance,  you have several virtual machines that perform work for you.  Those machines once started have a low demand for disk I/O.   Software raid will work for this until disk I/O demands begin to approach the limit of what software raid can provide  (about 70mb/s sustained writes and maybe up to 120 mb/s reads with typical sata drives @1TB in raid 5 for 2-3 TB total effective storage).  Smaller and faster drives will give somewhat better performance as will Raid 0 etc.  To use a real world example,  my server setup uses hardware raid, but in actuality it probably could have gotten by with software raid.  I see 60mb/s reads during system start.  But in operation I only see 1-5mb/s of disk I/O.  I have about 6-8 vms running on a server right now.  You can take your disk set and install it on another computer to retrieve a "on the metal" backup.  It won't matter if the machine can't carry the load since you won't actually boot into it.  Use clonezilla or Redo to get the backup.  It is then as simple as restoring it on the new server. 
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: christian on February 03, 2012, 07:36:00 am
vshaulsk, if issue is mobo (CPU or whatever else) and if no or few write operations were pending at the time system crashed, you could be able to restart as easily as swapping for another mobo still reusing same disks.
You may need some fsck, perhaps some trick in fstab but it's very likely, even if you change hardware, that Ubuntu will start again smoothly (perhaps not taking full advantage or this new hardware). Form there you will be able to export Zentyal data to be kept and reinstall if needed.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 03, 2012, 01:11:12 pm
Thank you for the advice and your thoughts.

I have ordered some new hardware..... mobo and CPU (actually instead of desktop hardware this time I went with server stuff.... dual Xenon).

- I guess step one once I install the new hardware.... is just to simply boot and see what happens.

- If it is successful I will see how well everything fits with my system and go from there.

- Now if it does not boot up properly I would like to go with a bare metal hypervisor instead of using Virtualbox (I have very bad performance if I use virtual samba server with virtualbox.... only get about 10 to 20 mbs read)  My real zentyal installation on the bare metal gets sustained reads of about 70 to 75 mbs when reading from a 7 X 2TB software raid6 (EXT4 file system)

- I definetly do not want to loose my raid6 setup (This has all of my business and client backups and information = 3.5TB)  My current OS uses 2 X 500 gig drives in software raid 1  (they use the EXT4 file system and are partitioned with root and home).  I would like to save if possible the home partition because this is where my clients store their personal information (I back it up on the raid6).   

So Half_life .... with what you proposed Proxmox or any other hypervisor would I be able to save this setup???? Parts of this setup???   It would be fantastic to at least save the raid6 and use perhaps just the zentyal backup utility to bring back clients home folder content (only have about 20 clients).     In a perfect scenario I would be able to bring back everything from the backup (Nginx configuration.... custom Intranet website information.... wpad.dat files). 

Title: Re: Server crashed -Hardware Failure.... need advice
Post by: half_life on February 05, 2012, 01:56:37 am
I am sure that your first priority would be to backup your client data to a safe location.  The first step would be to get the disk set into a working computer and boot up with a backup tool such as clonezilla or redo to verify that the disk set can be reassembled properly and then get your data out of harms way.  That being done,  then you have the luxury of testing installing a hypervisor like proxmox or similar.  I have used Xen, KVM , VMWare, and am now giving a more thorough look at Proxmox (KVM, container hybrid).  Proxmox offers a lot right out of the box for the typical small business admin type.  The reason I am giving it another look is the fact that they are incorporating HA (current 2.0 beta).   I am sorry that I have been slow to respond in this thread.  I know that you are feeling pressure to get this resolved.  If you need help,  I would be happy to do so.  I can message you offline with contact info to allow for a more timely response to questions.  Just say the word.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 05, 2012, 03:25:44 pm
Thank you Half_life.   My most vital client data is secure because I use a local FTP backup tool which saves and encrypts the data on several local windows7 machines (maybe not the best idea, but it is duplicated over two machines) so I would have to have two more machines fail in order to loose all vital data.

I use zarafa for email ..... definitely not sure how to save this data. May of the emails don't matter because I use gmail for the most vital stuff (I was always afraid of my server failing).  However there are a few that I would like to get back somehow.

The big thing that I don't know how to get around is my giant raid6 array.  I don't have any way to copy over 3.5 to 4 TB of data anywhere.  I do know that the data should be ok..... I was not accessing the data off those disks during the crash.  So I am thinking that mdadm should rebuild the raid fine just like it does when I had to reinstall the system before.

I was hoping that I could use one of the hypervisors and install it on my 500 gig drives.  Then give one of the install vm guests access to my storage disks directly.  However I have not found a way of doing this.  It seems all of these hypervisors only allow for virtual disks containers and not dirrect access to disks unless you go the NFS or Iscsi route.... which are not options for me.

If I am wrong or you know a way for me to create a virtual servers setup, but still keep my software raid6 with the data let me know. Otherwise I am going with plan B which will involve a secondary server (I will outline this plan in my next post)
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: half_life on February 05, 2012, 05:38:23 pm
I am guessing that you can keep your raid6 array without issue as long as you don't stray away from debian/ubuntu for the host.  My thinking would be your choices of hypervisor would be Straight Ubuntu install with KVM as hypervisor, or Proxmox distribution 1.9 which includes kvm and vz.  The Ubuntu flavor would allow to install a traditional X desktop  while the Proxmox solution uses a web interface.  To keep your Zentyal setup,  you still need to do the clonezilla/redo thing.  You could then lay your Zentyal machine back in as a VM.  To insure against mayhem when installing another OS,  just unplug the power to the raid disks.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: half_life on February 05, 2012, 06:09:22 pm
KVM allows direct assignment of disks to virtual machines.  While I haven't tried this,  it should allow you to assign /dev/md0 to a virtual machine drive (probably type virtio).  Proxmox doesn't allow for the possibility via the gui but allows these things via direct edit of the vm config file.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: stuartiannaylor on February 05, 2012, 07:16:27 pm
http://mcelog.org/faq.html

I get "kernel hardware error no human readable mce decoding support on this cpu type" This is pretty much a bug in newer Linux kernels. They print this message on every corrected error, even though it's useless and also the decoding into the kernel log is not very useful because mcelog can aggregate the information much better. This is fixed with  this patch  (http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commitdiff;h=6e3c7411d2b86bff210c59caa432e8e862037bfd)  To apply to a kernel: download  raw patch  (http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=patch;h=6e3c7411d2b86bff210c59caa432e8e862037bfd) , cd kernel source, patch -p1 < patchfile, recompile.




Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 05, 2012, 09:08:42 pm
Half_life what do you think about the following scenario.  Since with the new hardware and once my motherboard gets warrantied out from the old hardware.... I will have the potential to make two servers.  Here are some details below: (raid 6 disks are 5900 RPM and all other mechanical disks 7200 RPM if it matters)

New machine:  Dual xenon 5606  on an Asus board with 24 gigs of ram.  Has 4 Lan ports and has SAS/SATA card (has two mini SAS ports good for 8 drives).  I was thinking loading it with two 500 gig drives(make it hardware raid1 this time).  Also keep the 7 X 2TB drives in it (try to get my software raid 6 back up and running).  I also have a 32 gig SSD I was going to use for swap.

The old machine: AMD 1090T hexacore on an MSI motherboard with 16 gigs of ram.  I also have a 32 gig SSD I was going to use for swap.  I was thinking of loading it with two 750 gig drives in hardware raid 1.

For me it is important to have good Samba performance between local and VPN clients.  So I was thinking of keeping the zentyal file sharing module (make it a slave) on the direct hardware of the new machine.... this way I have the best performance possible for samba.  On this same system I was going to install virtualbox (Maybe KVM is better, but I don't know how to use it).  Inside virtualbox I will bring up my old system so that I have access to any data I might need.

Now once the second server is up and running I was going to use either proxmox or xen .... which ever one would give me the best performance results. 

Now since I have two servers which are able to run virtual guests.... I think I could create zentyal in the following way:
1) Master LDAP or go with Windows AD ( not sure which is better)
2) Run an instance of Zentyal as just a gateway.... perhaps with a few other modules running as well.
3) Have an instance of zentyal running in which I install zarafa ( this way I can use it as a mail server) + maybe recreate my main webserver on this instance. 

I figure if I make all the virtual machines in Virtualbox and maybe I can convert them into a format either xenserver or proxmox can accept ( this is my big question.... since these are completely different hypervisors would this work).
Maybe there is a better way to do this..... I am open to all sorts of scenarios....   
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: half_life on February 05, 2012, 11:30:14 pm
Half_life what do you think about the following scenario.  Since with the new hardware and once my motherboard gets warrantied out from the old hardware.... I will have the potential to make two servers.  Here are some details below: (raid 6 disks are 5900 RPM and all other mechanical disks 7200 RPM if it matters)

It is a best practice to keep hard drives like to like in an array.

New machine:  Dual xenon 5606  on an Asus board with 24 gigs of ram.  Has 4 Lan ports and has SAS/SATA card (has two mini SAS ports good for 8 drives).  I was thinking loading it with two 500 gig drives(make it hardware raid1 this time).  Also keep the 7 X 2TB drives in it (try to get my software raid 6 back up and running).  I also have a 32 gig SSD I was going to use for swap.

LSI or Highpoint? Keep in mind that you will not be able to add the 2TB drives to the hardware raid until you have retrieved your data.  Wouldn't the SSD drive serve better as the host OS drive?

The old machine: AMD 1090T hexacore on an MSI motherboard with 16 gigs of ram.  I also have a 32 gig SSD I was going to use for swap.  I was thinking of loading it with two 750 gig drives in hardware raid 1.

Here again I think that the SSD would better suit the OS since you probably won't impact your swap given your available ram.

For me it is important to have good Samba performance between local and VPN clients.  So I was thinking of keeping the zentyal file sharing module (make it a slave) on the direct hardware of the new machine.... this way I have the best performance possible for samba.  On this same system I was going to install virtualbox (Maybe KVM is better, but I don't know how to use it).  Inside virtualbox I will bring up my old system so that I have access to any data I might need.

Samba via VPN?  I think that I have missed that bit of information in the past when you were talking about throughput issues.  VPN queues things up to be transmitted serially over a one port connection.  It also adds overhead along the way. This would be the source of your bottleneck.  Otherwise performance can be improved with faster disk access times and throughput as you were thinking.  Virtualbox is easier to setup but not as fast as you have discovered. 

Now once the second server is up and running I was going to use either proxmox or xen .... which ever one would give me the best performance results. 


Given your use case,  I would put my efforts towards ease of use.  Performance between Xen and KVM in the fully virtualized environment give competitive numbers.  The tools to maintain the environment is where the difference is in my opinion.

Now since I have two servers which are able to run virtual guests.... I think I could create zentyal in the following way:
1) Master LDAP or go with Windows AD ( not sure which is better)
2) Run an instance of Zentyal as just a gateway.... perhaps with a few other modules running as well.
3) Have an instance of zentyal running in which I install zarafa ( this way I can use it as a mail server) + maybe recreate my main webserver on this instance. 

I figure if I make all the virtual machines in Virtualbox and maybe I can convert them into a format either xenserver or proxmox can accept ( this is my big question.... since these are completely different hypervisors would this work).
Maybe there is a better way to do this..... I am open to all sorts of scenarios....

It isn't frivolous but by the same token it is not hard to move a virtualbox machine to one of the others.  Xen and KVM are easy to move back and forth between them.  Maybe it would be better to start with what work you are trying to achieve.  For instance there might be a better way to approach things concerning your Samba over VPN issue if we knew more about what business problem you were trying to solve.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 06, 2012, 12:10:09 am
sorry Half_life .... I combined a couple of things and now I see how I my statements don't make sense.

1) My harddrives  (raid6 is 7 disks all 5900RPM software raid)....  my os currently sits on 2 500 gig disks 7200RPM in software raid 1.   I have two more 750 gig drives which are 7200RPM I was going to use for server two.  So each raid set is sitting on exact same harddrives.

2) I was thinking of using the 2 SSD, but they are very small  (32gig) and also from what I understand the linux kernel does not support trim.  So that is why I was just thinking of using them for swap. ( I am definitely open to other possibilities)

3)the raid card is an intel one.. (it is cheap so I am sure it just dumps the work over to the CPU)  I was just going to use its raid features in order to raid the two 500 gig drives together (make a hardware raid instead of software raid).  Once the OS is installed I was just going to plug in all the 2TB drives and see if I can recompile the raid6 with mdadm.

4) Samba..... sorry I meant to say all LAN clients. (virtualbox was giving me much slower performance for virtual samba shares) If KVM can fix that problem and I can mount my physical software raid6 into a virtual machine that would be great.



From what you are saying if I have to redo my server I should just use KVM this way it would be easy to transfer virtual guests between servers.  If one goes down I can use the next.

My biggest thing I want to bring up my email server and gateway as quickly as possible if another hardware failure happens.  It does not have to be automatic, but it would be nice it was just a simple migrating of a VM from one server to the next ... (plus just changing some physical network cables)
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: half_life on February 06, 2012, 04:49:27 am
sorry Half_life .... I combined a couple of things and now I see how I my statements don't make sense.

1) My harddrives  (raid6 is 7 disks all 5900RPM software raid)....  my os currently sits on 2 500 gig disks 7200RPM in software raid 1.   I have two more 750 gig drives which are 7200RPM I was gareoing to use for seLinux does support trim (discard)rver two.  So each raid set is sitting on exact same harddrives.

Sorry,  I misunderstood.

2) I was thinking of using the 2 SSD, but they are very smareall  (32gig) and also from what I understand the linux kernel does not support trim.  So that is why I was just thinking of using them for swap. ( I am definitely open to other possibilities)

Linux does support trim (discard)

From Wikipedia in pertinent part:

----------------------------------------------------------------------------------
Discard parameter in Linux
Although TRIM is supported in the Linux kernel since version 2.6.33, the operating system does not automatically enable TRIM operation. The user must modify the appropriate /etc/fstab file to add the word discard in the appropriate SSD entries. Without this user modification the Linux operating system will not pass the TRIM command to the SSD.[42]
[edit]Enabling unsupported operating systems
Where TRIM is not automatically supported by the operating system, there are utilities which can send TRIM commands manually. Usually they list all free blocks as specified by the operating system and then pass this list as a series of TRIM commands to the drive. These utilities are available from various manufacturers (Intel,[14] G.Skill[43]) or as general utilities (hdparm since v9.17[44][45]).


Discard parameter in Linux
Although TRIM is supported in the Linux kernel since version 2.6.33, the operating system does not automatically enable TRIM operation. The user must modify the appropriate /etc/fstab file to add the word discard in the appropriate SSD entries. Without this user modification the Linux operating system will not pass the TRIM command to the SSD.[42]
[edit]Enabling un
supported operating systems
Where TRIM is not automatically supported by the operating system, there are utilities which can send TRIM commands manually. Usually they list all free blocks as specified by the operating system and then pass this list as a series of TRIM commands to the drive. These utilities ar
e available from various manufacturers (Intel,[14] G.Skill[43]) or as general utilities (hdparm since v9.17[44][45]).
-------------------------------------------------------------------

3)the raid card is an intel one.. (it is cheap so I am sure it just dumps the work over to the CPU)  I was just going to use its raid features in order to raid the two 500 gig drives together (make a hardware raid instead of software raid).  Once the OS is installed I was just going to plug in all the 2TB drives and see if I can recompile the raid6 with mdadm.

That will work.

4) Samba..... sorry I meant to say all LAN clients. (virtualbox was giving me much slower performance for virtual samba shares) If KVM can fix that problem and I can mount my physical software raid6 into a virtual machine that would be great.

Xen and KVM will both improve that particular problem.  Tweeks to get the most performance out of disk I/O would include using virtio device type (disk and network card) and using LVM to store the virtual machine rather than a container file.  LVM makes it a little less simple to relocate VM's but still not too bad  (DRBD makes this a snap in a HA setup).  Of course starting with a solid disk I/O system on the real hardware really helps too(hardware raid with cache).


From what you are saying if I have to redo my server I should just use KVM this way it would be easy to transfer virtual guests between servers.  If one goes down I can use the next.

Either would do.  There are strengths in either path.  I would lean towards kvm over xen because that is the path the distributions are going.  Proxmox gives a neat and tidy interface and has a straight forward path to HA without too much fuss (the 2.0 beta has this and there will be an upgrade in place option when it goes gold).  To clear up what I am saying it is easy to use VM's back and forth between Xen and KVM (this would include Proxmox)

My biggest thing I want to bring up my email server and gateway as quickly as possible if another hardware failure happens.  It does not have to be automatic, but it would be nice it was just a simple migrating of a VM from one server to the next ... (plus just changing some physical network cables)

No problem to set things up for a fail over between machines.  No need to rearrange cables.  If you play your cards right,  the virtual machine is kept parallel between the servers.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 06, 2012, 05:33:21 am
Thank you for help !!!!

I finally have made it home .... gone for the weekend and had a chance to take out the motherboard.  Found a burnt out piece right away.  Now have to have MSI replace it !!!!

 - If the system does not come up by simply replacing hardware.....  I will try to proxmox direction.   

- going to use mechanical disk.... I think SSD it would cost me to much to get some more SSD drives (espcially big enough for my needs)  Plus I already have the mechanic disks...

- I am not very familiar with LVM so this will be a new thing + I have tried to google how to enable my software raid6 and give direct access to a VM (so far not entirely sure)

- Finally your statement about synchronizing the servers.... I thought this would only be possible if you had a NAS or SAN .... this way you could do live migration (I definitely do not know much about this)

I am super looking forward to trying this project out.... see if I can make it work like I want.   

THANK YOU !!!!
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: half_life on February 06, 2012, 07:51:09 am
You are welcome.  I need to caution you on one thing.  RAID does not replace a backup strategy.  It only gives you defense in depth.  If you lose a disk,  the RAID system should survive.  If you lose more than that or software decided to play scrabble with your data,  the chances of getting your data back go down.  I do 100% backup of all VM's every night as insurance.  In other words: dd if=/dev/VM/my_vm_lvm_snapshot  of=/media/backup_drive bs=1M.  My two big servers both happen to die at the same time, no problem.  Pull the drives and round up a bunch of smaller machines.  Install Ubuntu server with KVM and grab the xml files conveniently already placed on the backup drives.  2 hours or so and I can be back up.   The SSD drive would only be for the OS so I don't know about you but mine only takes up about 4-6gig typically.  Software RAID has a /dev/md?  device that you would just pass on to the VM.  I will let you know if that definitely works.  I am getting ready to re-work the home server and it would be safe to play around with this for me.  I just did a little test on my desktop system.  I passed through my software raid array to a Ubuntu LTS VM running under KVM.  It showed me the LVM partitions on it so that is an encouraging sign.  I am going to test it more formally before deciding if it works or not.  DRBD is an interesting technology to say the least.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: christian on February 06, 2012, 08:11:43 am
Very few (meaning here "zero"  ;D ) chance that MSI replaces components on you mobo. Best case they will replace mobo itself.
I may have missed some parts of this thread but I'm getting confused feeling that you try to achieve 2 different goals at same time:
- repair your server without the burden of handling migration, copy or whatever else of TB of data
- defining better fault tolerant implementation

My advice would be that you split it into two different threads:
- one to relaunch services asap (this should be easily done by replacing your mobo with no change at disk level)
- one to improve design

I've a couple of comment/question:
- did you look at swapping figures? I would be surprised that such huge server with 16GB of memory shows any swap activity especially if used to run Zentyal only. This could be different if you were running application server with lot of java based sessions but infrastructure services + Samba...  ??? How is your swappiness parameter tuned?
- I would rather dedicate SSD to system than swap, moving /log elsewhere
- I hope you realize, thank (kind of) to this hardware issue, that:
    - LVM "alone" is useless in case of hardware failure (except that impact is wider in case you have multiple machines running on same hardware)
    - RTO is sometimes different for internet & mail vs. Samba: running everything on same server is not always a good idea

Moving to HA aspects (sorry for this long post), I would like to clarify some concepts or at least to explain how I perceive it:
- LVM is an efficient way of providing fault tolerance if VM files are not stored on one server only.
- SAN or NAS can be used to achieve this. DRBD  is another approach. There is however significant difference between these designs:
   - SAN (and DRBD) works at block level while NAS works at file level. This means that data on NAS can be accessed from different servers at same time while SAN allocates data to one server only. This impact the way you swing from one server to another in case of failure.
   - DRBD can have noticeable impact in term of performance depending on amount of data to be synchronized.
   - If you decide to go for DRDB without losing Raid6 performance impact, you will have to build another dedicated file system, meaning more disks.

Because of the above, I would investigate something based on:
    - NAS for your data (fault tolerant disk, in case of mobo failure, replace it if you can afford this RTO)
    - LVM based on either SAN or better DRBD for infrastructure services so that you keep a live copy of you VM and restart quickly on available hardware in case of failure.

Unfortunately this has a cost  ::)
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: christian on February 06, 2012, 08:21:40 am
DRBD is an interesting technology to say the least.

Indeed  ;)
This is, now at OS level, what you have since years with remote snychro feature on Netapp or (kind of) EMC SRDF.
Very useful if used at the right place for the right purpose but nothing magic  :-[  Well, I writing this based on what I understand of DRDB. I never used it so far  :-[
I'm very curious about performance impact on file servers, e.g.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 06, 2012, 01:10:44 pm
Thank you.....   

Yes Christian... I have kind of went a different direction in this thread.  Mostly I am not sure if my system will just start backup if I replace the hardware with either the same mobo or just migrate the disks to a new server....   It will probably be a little bit before MSI replaces my mobo (if they do)  I don't think I can wait this long and will probably try the new hardware. So if the system does not boot backup I will have to quickly try to rebuild it (either going back with the same setup as before or a different one).... I use the zentyal backup tool to make a nightly backup, but I have no idea how to bring the system backup using that tool (I only know how to use it to retrieve files I or a client might have erased accidentally)

Half_life.... I completely understand that there is no way to 100% guarantee that a system won't crash or both systems won't crash.  This is why I was doing FTP backup using filesync from a client windows7 PC.  Just to give me a little more piece of mind.  It paid off because all the data is still accessible and I am able to continue doing work without issues.  I also understand the importance of taking snapshots if you run a VM (this is why the VM approach is very interesting to me). 

- Now about the SSD ..... the only reason I was thinking of adding them is that I have two laying around.  I am not sure if I will do this.. mostly because I only have two and I like doing system disks as raid encase of failure.  Maybe on once server I will add both disk.  (proper partitioning is a weak point of mine.... I am never sure of how to partition the system or what file system to use.) 

Having an SSD + 2 X 500 gig drives ...... how would you partition and what file system would you use.

Title: Re: Server crashed -Hardware Failure.... need advice
Post by: christian on February 06, 2012, 01:48:32 pm
Having an SSD + 2 X 500 gig drives ...... how would you partition and what file system would you use.

Ext4 + /var and swap on HDD instead of SSD  ;)
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 06, 2012, 02:20:02 pm
What about home???  Also if you go the VM route.... I really don't were to place them.  My SSD is only 32 gigs.... that is really not a hole lot of space.... once you start creating VM especially one that holds the users home accounts and data it will exceed the SSD.

This brings up an interesting question.  On one hand you have a certain partition scheme for the base system..... however on the other hand how would you partition a VM (would it be just one container file..... or would it be done how Half_life said using LVM(totally unfamiliar with LVM)).
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: christian on February 06, 2012, 02:52:48 pm
Basic idea is to move out of SSD everything that is not strictly "system" or  that my generate large amount of write operation. Thus if you have significant use of /home, then move it to your 500GB disks.
With 32GB for system, this is more than enough.

This must obviously be mitigated if disk performance is your main focus.

Regarding LVM: for everything else than system paritions, LVM is a must: it will allow smooth growing and ease storage management a lot.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 06, 2012, 03:50:28 pm
Thank you !!!   I really got a lot of help on this topic from Half_life, you and stuartiannaylor.   

I have a much more solid understanding of what I will need to do in order to get a more robust system.

(If I will have to reinstall.... I will just do a fresh install of proxmox and go with a virtual environment)
I think I will use both SSD (create a hardware raid 1 for the system) + create another hardware raid 1 with the 2 x 500 gig drives. (this is for server 1).....   Server 2 will be just hardware raid 2 x 750 gig drives.

Once I create the base system on server 1 I will try to install the mdadm tools and hook my raid6 backup (still not sure how possible this is since proxmox does not support software raid off the bat..... however I think it might work since I am not planning on using the software raid for the OS.  All I need it for is to mount it directly into a VM under the /mnt directory)

I believe using proxmox will allow me to shut down VM when needed and move them from one server to the next.  Also if one server fails completely I should be able to move the vital VM (LDAP, Gateway, Zarafa) over to the working server.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 06, 2012, 10:53:32 pm
Half_life ....  I have been reading more about DRBD and it sounds very interesting.  I do have a question do the two servers have to be completely identical when it comes to hardware ???

In my case the systems have different amount of ram and completely different cpu (not only one being amd and one being intel.... the new system has dual processors vs the old one being a single)

Finally if the CPU and ram don't play a role and I can still create this cluster.... would I even need hardware raid (not talking about my raid6 this is completely separate).  Would I be able to use one SSD, one 500 gig drive and one 750 gig drive in each server..... using DRBD I would be able to create a raid1 over the network the 500 gig and 750 gig drives.  I will keep the SSD on each system for the base Operating system.

From what I understand I can combine the 500 gig and 750 gig drive under LVM (total = 1.25 TB) and with DRBD essentially have raid 1 array.   I could then create my 4 VM servers.... I would like them to always be on the new server and only switch to the old if something fails.

Hopefully I understand this correctly.. after doing some reading during lunch today :)

 
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 14, 2012, 05:00:11 pm
So I got my new hardware (new processors and motherboard.... different brand and type then before)..... installed it and plugged my OS drives back in.

The system started right back up.  The only immediate issue has been with the network interfaces.  I did not know how to reset my original network interfaces and because they are tied to mac addresses they do not automatically work for the new network cards.

I am not sure if I did this correctly, but I set all the interfaces to not assigned (saved zentyal) and rebooted.  I then went into /etc/udev/70-persistent-net.rules and changed the eth(XX) of the new network interfaces (based on mac address) to match the number of the old interfaces.  Following this I deleted the old interfaces from the file and restarted the system. 

Now zentyal sees my new network interfaces and everything came back online.  I am not sure if this is the correct way and perhaps someone has a better explanation for me....   Also perhaps I should have made other changes when installing a new motherboard and cpu (not just simply plug in old drives and boot up)

On my second server I will try proxmox and if I am successful I will rebuild my production server.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: ichat on February 14, 2012, 08:04:27 pm
hmm. im not sure if you managed your NIC's the 'default' way,  but i would have done the same thing.. this way i can just manually leave allone all existing rules and settings and only   add the new mac + driver   to that eth interface... 

as long as you dont apply vlan tagging  while the new nic doesn't support that, you should probaly be fine...   

at least i would have done the same thing - i guess.
Title: Re: Server crashed -Hardware Failure.... need advice
Post by: vshaulsk on February 14, 2012, 08:19:13 pm
Ichat actually I do have Vlan taggin enabled..... I think the trick was to make sure that the original interfaces were set to nothing (basically change them to not assigned) and restart the system.  After that once I erased the old lines for the old interfaces in /etc/udev/70-persistent-net.rules (I guess I could have just commented them out).... simply changing the eth(XX) number on the new interfaces to match the originals did the trick.  It even brought back my VLans as soon as I assigned the same Vlan #. 

I guess what I have learned most from this is that unless both of my raid disks get corrupt when a motherboard, cpu or PSU go out.... I can recover my system within 24 hours by overnight parts (which is an acceptable time for me).  Also I could just keep a spare server (not as powerful) and just insert the main drive disks + /home disks which will allow me to get most my system back up within minutes.  The only thing I would not have is my raid6 array and probably the ability to turn on all the virtual machines(due to hardware limitation).


I still need to learn how to bring up my entire system from my backups which are on an FTP location..... I will probably learn that next.