Author Topic: Server crashed -Hardware Failure.... need advice  (Read 5154 times)

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Server crashed -Hardware Failure.... need advice
« on: February 02, 2012, 02:51:25 pm »
Last night  I was saving changes in the zentyal gui (I was actually at the server not remote).

At the same time one of my users was logging in to their windows 7 machine ..... roaming profiles (not sure if this had anything to do with what happened next)

All of a sudden I saw the CPU utilization go up to 100% on all 6 cores.  Maybe about 10 to 20 seconds something in the machine all of a sudden flashed and the hole thing restarted.

Now I can smell a burning or burnt smell......  So I know that something hardware related failed....

The system does turn back on .... gets past the Bios screen ..... goes into the screen where I can select the kernel (2.6.32-38 or whatever) and then all of a sudden spits out the attached screen shot.  I am trying to figure out if the problem is motherboard, cpu, ram or something else so I can replace it.

Also depending on what it is.... I need to know once it is replaced will I be able to boot the system normally or will I have to disaster recover.

- if it is just the CPU .... will simply putting in a new one allow the system to boot ???

- How would you recover if you install a new motherboard ???

I can overnight parts, but I need to figure out what to get and how to recover once I change out the part.

Thank you !!!

Also I guess I am not sure why the system would fail like this.  I have loaded up plenty of other system... I can see them hitting their thermal limit and shutting down... but failing like that ..... not expected.

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #1 on: February 02, 2012, 02:53:18 pm »
Here is the full print screen

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #2 on: February 02, 2012, 09:22:04 pm »
What do I have to do in order to get my system working on another server??

Since the second machine has different hardware (motherboard, ram and cpu).... simply transferring the hard drives would probably not be enough to get the system up.....

What would I have to do in order to restore my system???

Disaster recover???

hyerk

  • Zen Apprentice
  • *
  • Posts: 25
  • Karma: +2/-0
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #3 on: February 02, 2012, 11:21:27 pm »
The first thing I would do is look at the RAM.  I was trying to install Zentyal on a HP DL380 and it would not complete the install due to bad ram, but would still post and start the install.  Try to download the memtest86+ iso and burn it to a CD and see it it will boot properly.  Also, If you have other modules you can put in place of the current I'd try it.  If not, I'd start removing the modules one by one, or pair by pair depending on you server.  I'd try booting with the minimal amount of ram to see if it would boot properly.  Also, if you say you saw a flash, have you looked inside to see if you can see any physical damage (melted or black marks).

Sorry I'm not that much help for deciphering the error messages, but that's what I'd do to test the hardware.


vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #4 on: February 03, 2012, 02:04:20 am »
Well I tried some other ram modules 1 by 1 and still the same problem.  So its not the ram.

Turned off all the extra fans .... took all the side covers off.  Took out all extra PCI cards.  Basically ran it bare minimum.

Definitely smell something burning or melting.   However can not see..... any physical damage ... melted wires or anything.   Also cant see any smoke... so totally not sure.

Basically once I get past the post screen... I see the kernel option screen....  after that I iether get the error message or the screen goes black and looses connection (nothing after that).


So I have decided to get a new motherboard and CPU.   

Thinking of getting dual xenon 5606 quad core processors with either an Asus or SuperMicro motherboard.

I can then have more time to figure out what which part has failed on the old one and make it into a backup server or NAS or something else.


How to recover from this is going to be the tuff question.... since it will be new motherboard and cpu.... what are the steps to get the system up and running???

Do I really have to reinstall everything or can I just recover by using my backups????

half_life

  • Bug Hunter
  • Zen Hero
  • *****
  • Posts: 867
  • Karma: +59/-0
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #5 on: February 03, 2012, 02:16:13 am »
Given that a failure in the cpu would probably only allow it to function for a short time before complete failure, and the fact that you are smelling it suggest to me that it is a component on the motherboard.  I would give it a wag as a voltage regulator and therefore would suggest not turning it on anymore unless you are prepared to walk away from both suspects.

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #6 on: February 03, 2012, 02:39:32 am »
Yes I agree with you.  It is under warranty so I will go and have it taken care of.


Any advice on rebuilding the system....

I use the backup function and back everything up to a local drive.

half_life

  • Bug Hunter
  • Zen Hero
  • *****
  • Posts: 867
  • Karma: +59/-0
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #7 on: February 03, 2012, 02:51:47 am »
Aren't you running virtualized?  If so, it would be fairly straight forward to retrieve the machines.

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #8 on: February 03, 2012, 03:06:30 am »
On my current system I run an instance of zentyal on the metal hardware.  I use virtualbox to run some webservers, zentyal test machines and anything else I need.

With my hardware I think I might use a hypervisor on the bare metal and install from there.  However that leads me to a question.... in order for me to do that would I need to have a SAN's or could I still do it with a local disk setup???

I guess I am not sure if a bare metal hypervisor supports software raid through mdadm.... I don't think it does...... which means I will have to go back to just installing zentyal on the bare metal with virtual box inside it running the VMs

Since my zentyal install is not a VM I am not exactly sure on how to recover it......

half_life

  • Bug Hunter
  • Zen Hero
  • *****
  • Posts: 867
  • Karma: +59/-0
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #9 on: February 03, 2012, 06:23:26 am »
The question of a SAN comes along with the need for hardware fencing because at that point you are talking about high availability.  A SAN is not necessary for HA.  DRBD will get you what you need on that count.  Proxmox is considered a bare metal hypervisor and in actuality is a debian squeeze setup.  Software raid will work in that environment but depending on your full layout might give less than stellar results.  For instance,  you have several virtual machines that perform work for you.  Those machines once started have a low demand for disk I/O.   Software raid will work for this until disk I/O demands begin to approach the limit of what software raid can provide  (about 70mb/s sustained writes and maybe up to 120 mb/s reads with typical sata drives @1TB in raid 5 for 2-3 TB total effective storage).  Smaller and faster drives will give somewhat better performance as will Raid 0 etc.  To use a real world example,  my server setup uses hardware raid, but in actuality it probably could have gotten by with software raid.  I see 60mb/s reads during system start.  But in operation I only see 1-5mb/s of disk I/O.  I have about 6-8 vms running on a server right now.  You can take your disk set and install it on another computer to retrieve a "on the metal" backup.  It won't matter if the machine can't carry the load since you won't actually boot into it.  Use clonezilla or Redo to get the backup.  It is then as simple as restoring it on the new server. 

christian

  • Guest
Re: Server crashed -Hardware Failure.... need advice
« Reply #10 on: February 03, 2012, 07:36:00 am »
vshaulsk, if issue is mobo (CPU or whatever else) and if no or few write operations were pending at the time system crashed, you could be able to restart as easily as swapping for another mobo still reusing same disks.
You may need some fsck, perhaps some trick in fstab but it's very likely, even if you change hardware, that Ubuntu will start again smoothly (perhaps not taking full advantage or this new hardware). Form there you will be able to export Zentyal data to be kept and reinstall if needed.

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #11 on: February 03, 2012, 01:11:12 pm »
Thank you for the advice and your thoughts.

I have ordered some new hardware..... mobo and CPU (actually instead of desktop hardware this time I went with server stuff.... dual Xenon).

- I guess step one once I install the new hardware.... is just to simply boot and see what happens.

- If it is successful I will see how well everything fits with my system and go from there.

- Now if it does not boot up properly I would like to go with a bare metal hypervisor instead of using Virtualbox (I have very bad performance if I use virtual samba server with virtualbox.... only get about 10 to 20 mbs read)  My real zentyal installation on the bare metal gets sustained reads of about 70 to 75 mbs when reading from a 7 X 2TB software raid6 (EXT4 file system)

- I definetly do not want to loose my raid6 setup (This has all of my business and client backups and information = 3.5TB)  My current OS uses 2 X 500 gig drives in software raid 1  (they use the EXT4 file system and are partitioned with root and home).  I would like to save if possible the home partition because this is where my clients store their personal information (I back it up on the raid6).   

So Half_life .... with what you proposed Proxmox or any other hypervisor would I be able to save this setup???? Parts of this setup???   It would be fantastic to at least save the raid6 and use perhaps just the zentyal backup utility to bring back clients home folder content (only have about 20 clients).     In a perfect scenario I would be able to bring back everything from the backup (Nginx configuration.... custom Intranet website information.... wpad.dat files). 

« Last Edit: February 03, 2012, 01:13:26 pm by vshaulsk »

half_life

  • Bug Hunter
  • Zen Hero
  • *****
  • Posts: 867
  • Karma: +59/-0
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #12 on: February 05, 2012, 01:56:37 am »
I am sure that your first priority would be to backup your client data to a safe location.  The first step would be to get the disk set into a working computer and boot up with a backup tool such as clonezilla or redo to verify that the disk set can be reassembled properly and then get your data out of harms way.  That being done,  then you have the luxury of testing installing a hypervisor like proxmox or similar.  I have used Xen, KVM , VMWare, and am now giving a more thorough look at Proxmox (KVM, container hybrid).  Proxmox offers a lot right out of the box for the typical small business admin type.  The reason I am giving it another look is the fact that they are incorporating HA (current 2.0 beta).   I am sorry that I have been slow to respond in this thread.  I know that you are feeling pressure to get this resolved.  If you need help,  I would be happy to do so.  I can message you offline with contact info to allow for a more timely response to questions.  Just say the word.

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #13 on: February 05, 2012, 03:25:44 pm »
Thank you Half_life.   My most vital client data is secure because I use a local FTP backup tool which saves and encrypts the data on several local windows7 machines (maybe not the best idea, but it is duplicated over two machines) so I would have to have two more machines fail in order to loose all vital data.

I use zarafa for email ..... definitely not sure how to save this data. May of the emails don't matter because I use gmail for the most vital stuff (I was always afraid of my server failing).  However there are a few that I would like to get back somehow.

The big thing that I don't know how to get around is my giant raid6 array.  I don't have any way to copy over 3.5 to 4 TB of data anywhere.  I do know that the data should be ok..... I was not accessing the data off those disks during the crash.  So I am thinking that mdadm should rebuild the raid fine just like it does when I had to reinstall the system before.

I was hoping that I could use one of the hypervisors and install it on my 500 gig drives.  Then give one of the install vm guests access to my storage disks directly.  However I have not found a way of doing this.  It seems all of these hypervisors only allow for virtual disks containers and not dirrect access to disks unless you go the NFS or Iscsi route.... which are not options for me.

If I am wrong or you know a way for me to create a virtual servers setup, but still keep my software raid6 with the data let me know. Otherwise I am going with plan B which will involve a secondary server (I will outline this plan in my next post)

half_life

  • Bug Hunter
  • Zen Hero
  • *****
  • Posts: 867
  • Karma: +59/-0
    • View Profile
Re: Server crashed -Hardware Failure.... need advice
« Reply #14 on: February 05, 2012, 05:38:23 pm »
I am guessing that you can keep your raid6 array without issue as long as you don't stray away from debian/ubuntu for the host.  My thinking would be your choices of hypervisor would be Straight Ubuntu install with KVM as hypervisor, or Proxmox distribution 1.9 which includes kvm and vz.  The Ubuntu flavor would allow to install a traditional X desktop  while the Proxmox solution uses a web interface.  To keep your Zentyal setup,  you still need to do the clonezilla/redo thing.  You could then lay your Zentyal machine back in as a VM.  To insure against mayhem when installing another OS,  just unplug the power to the raid disks.