Author Topic: High-Load Diagnosis help-- zentyal 2.2  (Read 5556 times)

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
High-Load Diagnosis help-- zentyal 2.2
« on: November 03, 2011, 01:22:50 pm »
Yesterday during the day I had a strange event happen.

My system shut down randomly..... I found out about it when I received an email from the zentyal cloud service telling me my system disconnected.

The only thing I can tell is that all of a sudden the system load went really high compared to normal operation and that all of my processor cores went from being mostly idle to full load.  I am assuming the temperature kept rising until eventually a thermal limit was hit and the system shutdown.

Here is some system information:
AMD hexacore 3.2 ghz - on a MSI 890X motherboard - factory cooler
16 gig DDR3-1600 ram
2X 500 gig SATA drives in raid1 - primary OS drives
7X 2TB SATA drives in software raid 6 - file storage

This has been my setup for zentyal 2.0 (running for about a year) and now zentyal 2.2 running for about the last month.

I have three virtual servers running on the zentyal host using virtualbox 4.1
One is another zentyal server (with just webserver module), One FreeNas 8.0 and One FreeNas 7.

I normally see system loads of about 1 to 1.5 (sometimes rarely I see 2.0).  The CPU's are typically operating between 85% and 90% idle.  Small amount of users so the system idles most of the time.

Yesterday the system load jumped up to 8 and all the cores went to 100%....... system shut down about 15 minutes later

How would I diagnosis what happened??? Which log would I look at to see what the system was doing right before shutdown??  Also I think the shutdown was caused by thermal failsafe, but I am not sure.... how can I tell what triggered the exact system shutdown????

Thank you for reading this and any direction you can provide on how to start looking at what possibly caused this issue.  I have never had this happen... Zentyal for the most part has been running smoothly as far as performance goes.

Thank you !!!

Marcus

  • Forum Moderator
  • Zen Samurai
  • *****
  • Posts: 395
  • Karma: +12/-0
    • View Profile
    • Professional IT Service
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #1 on: November 03, 2011, 02:41:13 pm »
Hello,

I would suggest you to install Nagios on your server. This way you'll be aware when the load is going to the roof.

On your next high load;
1) log to your server using a terminal
2) type: top

And let us know what process is chewing up all your CPU.

Best,

Marcus

christian

  • Guest
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #2 on: November 03, 2011, 03:24:33 pm »
I would suggest you to install Nagios...

Same here, either Nagios or Cacti

Quote
...on your server.

Slightly different view: do not install monitoring tool os hardware you want to monitor but on "something" dedicated to monitoring (which means very stable, not loaded etc...

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #3 on: November 03, 2011, 03:45:16 pm »
Christian,

What exactly do you mean when you say do not install it on OS, but on something else???  Can I install the monitoring system on a client pc (windows) and have it monitor the server????


Also just from looking at my system and top.... I notice the zentyal says that almost all of my 16 gigs of ram is being used.  Why is this happening???  My virtual machines combined should only be using 8 gigs of ram..... which leaves another 8 gigs for the host OS....

Why is zentyal (host OS) using all available left over ram???  At the most I figured it would be using 4 gigs and the rest would just be free...  However it seems that I have almost no free ram available... perhaps the high load was caused by the system all of a sudden using swap?????

Marcus

  • Forum Moderator
  • Zen Samurai
  • *****
  • Posts: 395
  • Karma: +12/-0
    • View Profile
    • Professional IT Service
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #4 on: November 03, 2011, 03:52:29 pm »
Can I install the monitoring system on a client pc (windows) and have it monitor the server????
Christian said stable - not Windows :D

The problem with installing Nagios on a remote system is the NRPE configuration that is a bit tricky.

Anyways, if you only want to find out your problem and then passing to something else, simply install Nagios and once the problem solved, uninstall it...

The previous command (top) will also let you know what is eating up all the RAM.

**And yes, SWAP will put a high load on your CPU.

Best,

Marcus

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #5 on: November 03, 2011, 04:09:22 pm »
Well when I look at ram....

I see virtual machines using about 8 gigs  ..... this leaves about 8 gigs for zentyal host OS

the total used however is 15.9 gigs which means zentyal is using 8 gigs.

Out of those 8 about 4 gigs is being used as cached..... I am assuming page cache.....

Is this correct???  Should the system be using this much ram??

Sam Graf

  • Guest
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #6 on: November 03, 2011, 04:29:42 pm »
It would be Zentyal plus anything else you've installed on the hardware (that is, not on a virtual machine).

Marcus

  • Forum Moderator
  • Zen Samurai
  • *****
  • Posts: 395
  • Karma: +12/-0
    • View Profile
    • Professional IT Service
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #7 on: November 03, 2011, 04:30:34 pm »
Hello,

Actually Zentyal is using almost nothing...  It is only a website!  You must find out what is the process that is using everything

What are the results of:
Code: [Select]
top
e.g. of a "top"

top - 11:24:33 up 146 days,  3:49,  1 user,  load average: 0.26, 0.23, 0.16
Tasks: 226 total,   1 running, 225 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.2%sy,  0.0%ni, 99.0%id,  0.5%wa,  0.2%hi,  0.0%si,  0.0%st
Mem:   1860772k total,  1747240k used,   113532k free,   344036k buffers
Swap: 38971256k total,   281556k used, 38689700k free,   262180k cached
PIDUSERPRNIVIRTRESSHRS%CPU%MEMTIME+COMMAND
2417mysql200353m31m3408S01.7877:57.52mysqld

Best,

Marcus

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #8 on: November 03, 2011, 05:09:02 pm »
Sam,

The only thing I added extra to the host OS .... is subsonic and Nginx reverse proxy.  Neither one of those is showing a lot of ram usage.

Other than the virtual machines.... the rest of the processes are either kilobytes of ram or maybe a few megabytes.  They are all process part of server system.... anything from www-data, mysql, openldap,snort etc......
The biggest ram hog I see is that cached takes up close to 4 gigs of ram and it keeps building up.... it will keep building up until almost of the ram is used up.  What is cached??? How is it controlled??  Maybe this is normal behavior...... I see in marcus's example .... his ram is almost all used up as well (however he has 2 gigs and I have 16)

Sam Graf

  • Guest
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #9 on: November 03, 2011, 07:38:08 pm »
I should have been clearer. The high load situation would have to cover all software running on the machine, making information provided by top helpful beyond the RAM usage information.

As for the page cache gobbling up RAM over time, In Zentyal's case, this seems to be normal behavior on some configurations. My sense is that machines running file sharing are more prone to it.

But I have yet to see the condition peg the CPUs. In that sense it doesn't seem like a typical memory leak, which can bring a machine to its knees. So I don't know the cause or the technical explanation (it would be nice to know the why, though), but I have no experience with it significantly impacting CPU loads.

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #10 on: November 03, 2011, 08:21:05 pm »
So is the case of Zentyal taking up more and more ram an issue which should be addressed or should I just let it go??

Is there a way to clear it?  Should it be cleared..... this does not seem correct behavior....



I do agree with you Sam that the high CPU utilization points to it not being a ram problem....  If it does happen again perhaps I will be able to look at the problem.

The memory leak question still remains.

christian

  • Guest
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #11 on: November 03, 2011, 08:49:33 pm »
What I mean with "remote monitoring" is that:
1 - monitoring server should not be Windows. Marcus is more than right here  ;D
2 - joke aside, I mean that if server you want to monitor is overloaded for whatever reason, it is very likely that your monitoring application get blocked too, so investigation becomes a little difficult  ::) reason why I suggest to have it "elsewhere". Then monitoring may rely on either SNMP or agent running locally on your server or mix of both depending on what you want to measure.
3 - Does it help if I tell you that some month ago I faced similar "high load" situation (although it never reached the point where server was totally blocked because I identified problem quite early) and this was due to... Zentyal log module.

Sam Graf

  • Guest
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #12 on: November 03, 2011, 09:11:49 pm »
So is the case of Zentyal taking up more and more ram an issue which should be addressed or should I just let it go??

Is there a way to clear it?  Should it be cleared..... this does not seem correct behavior....
Linux enthusiasts tell us ordinary folk to just let the memory manager do its job. Sure, but us ordinary folk are just silly enough to wonder if it is doing its job ...

My experience is that restarting the machine puts things right (for a time). Since I usually restarted when the kernel was updated I had a built-in periodic "fix." Whether this actually fixes (or prevents) anything but a little swap usage is something I don't know, since nothing truly bad seemed to happen.

Well, the first time I encountered it I doubled the installed RAM. When the same server did it again, and free RAM dropped to the same frighteningly low level as before, I decided it wasn't worth spending more money on (since I wasn't even sure if there was a problem, or if so, its cause). That's when I decided my periodic restart "fix" was good enough.

Marcus

  • Forum Moderator
  • Zen Samurai
  • *****
  • Posts: 395
  • Karma: +12/-0
    • View Profile
    • Professional IT Service
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #13 on: November 03, 2011, 09:12:34 pm »
Hello,

Just to avoir any confusion around Zentyal.
Zentyal is a simple Website.  From that point, I would be surprise if Zentyal would use, let say, 5 MB RAM when not doing any admin stuff.

So, as I requested earlier, do a simple "top" to see what is using everything


Otherwise you may take a quarter, flip it and lets say:
tail = spamd
bitch = postgres

Well, actually a dice would be better...
1 = spamd
2 = postgres
3 = apache
4 = check_smtp
...

More seriously - to fix your problem, you must have clear answer on what is going wrong.  You can't just go like that and start shooting without aiming.

e.g.
(I'm running Ubuntu - the same OS that is on your server except with Gnome)
When My desktop is getting sluggish, I click:
System >> Administration >> System Monitor >> Process >> Memory

Then I kill the bad guy and BAM! my system goes fast like Lightning McQueen once again.

Best,

Marcus

vshaulsk

  • Zen Samurai
  • ****
  • Posts: 477
  • Karma: +9/-1
    • View Profile
Re: High-Load Diagnosis help-- zentyal 2.2
« Reply #14 on: November 03, 2011, 09:39:52 pm »
Well currently the system is not being overloaded.... the load just as it has always been is sitting at about 1 to 1.5.  The system is always pretty responsive and generally works well. Just the one time yesterday that I have ever seen that happen. By the time I came home the system was shut down... probably a thermal trigger.

Christian,
I was also looking at logs right before this happened (maybe 10 min before I got the disconnected message from the cloud).....
Perhaps we both had the same problem..... except I was not by my system to do anything about it.  In your case do you know why it happened???


The memory leak ..... seems strange to me.... I don't understand why more people don't say anything about it.  I have 8 gigs for the host os and my ram shows at 200 to 300 mb free !!!  From when I restarted to right now has only been 24 hours...seems strange, but I am not a linux guru.
« Last Edit: November 03, 2011, 09:41:32 pm by vshaulsk »