Zentyal Forum, Linux Small Business Server

Zentyal Server => Installation and Upgrades => Topic started by: vshaulsk on November 03, 2011, 01:22:50 pm

Title: High-Load Diagnosis help-- zentyal 2.2
Post by: vshaulsk on November 03, 2011, 01:22:50 pm
Yesterday during the day I had a strange event happen.

My system shut down randomly..... I found out about it when I received an email from the zentyal cloud service telling me my system disconnected.

The only thing I can tell is that all of a sudden the system load went really high compared to normal operation and that all of my processor cores went from being mostly idle to full load.  I am assuming the temperature kept rising until eventually a thermal limit was hit and the system shutdown.

Here is some system information:
AMD hexacore 3.2 ghz - on a MSI 890X motherboard - factory cooler
16 gig DDR3-1600 ram
2X 500 gig SATA drives in raid1 - primary OS drives
7X 2TB SATA drives in software raid 6 - file storage

This has been my setup for zentyal 2.0 (running for about a year) and now zentyal 2.2 running for about the last month.

I have three virtual servers running on the zentyal host using virtualbox 4.1
One is another zentyal server (with just webserver module), One FreeNas 8.0 and One FreeNas 7.

I normally see system loads of about 1 to 1.5 (sometimes rarely I see 2.0).  The CPU's are typically operating between 85% and 90% idle.  Small amount of users so the system idles most of the time.

Yesterday the system load jumped up to 8 and all the cores went to 100%....... system shut down about 15 minutes later

How would I diagnosis what happened??? Which log would I look at to see what the system was doing right before shutdown??  Also I think the shutdown was caused by thermal failsafe, but I am not sure.... how can I tell what triggered the exact system shutdown????

Thank you for reading this and any direction you can provide on how to start looking at what possibly caused this issue.  I have never had this happen... Zentyal for the most part has been running smoothly as far as performance goes.

Thank you !!!
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Marcus on November 03, 2011, 02:41:13 pm
Hello,

I would suggest you to install Nagios on your server. This way you'll be aware when the load is going to the roof.

On your next high load;
1) log to your server using a terminal
2) type: top

And let us know what process is chewing up all your CPU.

Best,

Marcus
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: christian on November 03, 2011, 03:24:33 pm
I would suggest you to install Nagios...

Same here, either Nagios or Cacti

Quote
...on your server.

Slightly different view: do not install monitoring tool os hardware you want to monitor but on "something" dedicated to monitoring (which means very stable, not loaded etc...
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: vshaulsk on November 03, 2011, 03:45:16 pm
Christian,

What exactly do you mean when you say do not install it on OS, but on something else???  Can I install the monitoring system on a client pc (windows) and have it monitor the server????


Also just from looking at my system and top.... I notice the zentyal says that almost all of my 16 gigs of ram is being used.  Why is this happening???  My virtual machines combined should only be using 8 gigs of ram..... which leaves another 8 gigs for the host OS....

Why is zentyal (host OS) using all available left over ram???  At the most I figured it would be using 4 gigs and the rest would just be free...  However it seems that I have almost no free ram available... perhaps the high load was caused by the system all of a sudden using swap?????
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Marcus on November 03, 2011, 03:52:29 pm
Can I install the monitoring system on a client pc (windows) and have it monitor the server????
Christian said stable - not Windows :D

The problem with installing Nagios on a remote system is the NRPE configuration that is a bit tricky.

Anyways, if you only want to find out your problem and then passing to something else, simply install Nagios and once the problem solved, uninstall it...

The previous command (top) will also let you know what is eating up all the RAM.

**And yes, SWAP will put a high load on your CPU.

Best,

Marcus
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: vshaulsk on November 03, 2011, 04:09:22 pm
Well when I look at ram....

I see virtual machines using about 8 gigs  ..... this leaves about 8 gigs for zentyal host OS

the total used however is 15.9 gigs which means zentyal is using 8 gigs.

Out of those 8 about 4 gigs is being used as cached..... I am assuming page cache.....

Is this correct???  Should the system be using this much ram??
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Sam Graf on November 03, 2011, 04:29:42 pm
It would be Zentyal plus anything else you've installed on the hardware (that is, not on a virtual machine).
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Marcus on November 03, 2011, 04:30:34 pm
Hello,

Actually Zentyal is using almost nothing...  It is only a website!  You must find out what is the process that is using everything

What are the results of:
Code: [Select]
top
e.g. of a "top"

top - 11:24:33 up 146 days,  3:49,  1 user,  load average: 0.26, 0.23, 0.16
Tasks: 226 total,   1 running, 225 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.2%sy,  0.0%ni, 99.0%id,  0.5%wa,  0.2%hi,  0.0%si,  0.0%st
Mem:   1860772k total,  1747240k used,   113532k free,   344036k buffers
Swap: 38971256k total,   281556k used, 38689700k free,   262180k cached
PIDUSERPRNIVIRTRESSHRS%CPU%MEMTIME+COMMAND
2417mysql200353m31m3408S01.7877:57.52mysqld

Best,

Marcus
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: vshaulsk on November 03, 2011, 05:09:02 pm
Sam,

The only thing I added extra to the host OS .... is subsonic and Nginx reverse proxy.  Neither one of those is showing a lot of ram usage.

Other than the virtual machines.... the rest of the processes are either kilobytes of ram or maybe a few megabytes.  They are all process part of server system.... anything from www-data, mysql, openldap,snort etc......
The biggest ram hog I see is that cached takes up close to 4 gigs of ram and it keeps building up.... it will keep building up until almost of the ram is used up.  What is cached??? How is it controlled??  Maybe this is normal behavior...... I see in marcus's example .... his ram is almost all used up as well (however he has 2 gigs and I have 16)
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Sam Graf on November 03, 2011, 07:38:08 pm
I should have been clearer. The high load situation would have to cover all software running on the machine, making information provided by top helpful beyond the RAM usage information.

As for the page cache gobbling up RAM over time, In Zentyal's case, this seems to be normal behavior on some configurations. My sense is that machines running file sharing are more prone to it.

But I have yet to see the condition peg the CPUs. In that sense it doesn't seem like a typical memory leak, which can bring a machine to its knees. So I don't know the cause or the technical explanation (it would be nice to know the why, though), but I have no experience with it significantly impacting CPU loads.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: vshaulsk on November 03, 2011, 08:21:05 pm
So is the case of Zentyal taking up more and more ram an issue which should be addressed or should I just let it go??

Is there a way to clear it?  Should it be cleared..... this does not seem correct behavior....



I do agree with you Sam that the high CPU utilization points to it not being a ram problem....  If it does happen again perhaps I will be able to look at the problem.

The memory leak question still remains.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: christian on November 03, 2011, 08:49:33 pm
What I mean with "remote monitoring" is that:
1 - monitoring server should not be Windows. Marcus is more than right here  ;D
2 - joke aside, I mean that if server you want to monitor is overloaded for whatever reason, it is very likely that your monitoring application get blocked too, so investigation becomes a little difficult  ::) reason why I suggest to have it "elsewhere". Then monitoring may rely on either SNMP or agent running locally on your server or mix of both depending on what you want to measure.
3 - Does it help if I tell you that some month ago I faced similar "high load" situation (although it never reached the point where server was totally blocked because I identified problem quite early) and this was due to... Zentyal log module.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Sam Graf on November 03, 2011, 09:11:49 pm
So is the case of Zentyal taking up more and more ram an issue which should be addressed or should I just let it go??

Is there a way to clear it?  Should it be cleared..... this does not seem correct behavior....
Linux enthusiasts tell us ordinary folk to just let the memory manager do its job. Sure, but us ordinary folk are just silly enough to wonder if it is doing its job ...

My experience is that restarting the machine puts things right (for a time). Since I usually restarted when the kernel was updated I had a built-in periodic "fix." Whether this actually fixes (or prevents) anything but a little swap usage is something I don't know, since nothing truly bad seemed to happen.

Well, the first time I encountered it I doubled the installed RAM. When the same server did it again, and free RAM dropped to the same frighteningly low level as before, I decided it wasn't worth spending more money on (since I wasn't even sure if there was a problem, or if so, its cause). That's when I decided my periodic restart "fix" was good enough.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Marcus on November 03, 2011, 09:12:34 pm
Hello,

Just to avoir any confusion around Zentyal.
Zentyal is a simple Website.  From that point, I would be surprise if Zentyal would use, let say, 5 MB RAM when not doing any admin stuff.

So, as I requested earlier, do a simple "top" to see what is using everything


Otherwise you may take a quarter, flip it and lets say:
tail = spamd
bitch = postgres

Well, actually a dice would be better...
1 = spamd
2 = postgres
3 = apache
4 = check_smtp
...

More seriously - to fix your problem, you must have clear answer on what is going wrong.  You can't just go like that and start shooting without aiming.

e.g.
(I'm running Ubuntu - the same OS that is on your server except with Gnome)
When My desktop is getting sluggish, I click:
System >> Administration >> System Monitor >> Process >> Memory

Then I kill the bad guy and BAM! my system goes fast like Lightning McQueen once again.

Best,

Marcus
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: vshaulsk on November 03, 2011, 09:39:52 pm
Well currently the system is not being overloaded.... the load just as it has always been is sitting at about 1 to 1.5.  The system is always pretty responsive and generally works well. Just the one time yesterday that I have ever seen that happen. By the time I came home the system was shut down... probably a thermal trigger.

Christian,
I was also looking at logs right before this happened (maybe 10 min before I got the disconnected message from the cloud).....
Perhaps we both had the same problem..... except I was not by my system to do anything about it.  In your case do you know why it happened???


The memory leak ..... seems strange to me.... I don't understand why more people don't say anything about it.  I have 8 gigs for the host os and my ram shows at 200 to 300 mb free !!!  From when I restarted to right now has only been 24 hours...seems strange, but I am not a linux guru.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Marcus on November 03, 2011, 09:49:36 pm
Hello,

I didn't experienced any memory leak on any of the Zentyal's software collection... 

Also, if you pay attention to the "top" I posted earlier, you'll see that the server that I used for this exemple is up for over 146 days and that it is running with a little 2GB of ram.

I'm telling you, if you don't do at least a "top" to find out what process is going nut, you'll never know what is really going on...  You'll only be guessing and that doesn't worth a penny. 

The other trade off is that you'll probably never know what was wrong - you'll be blaming X or Y without any solid proof and end up frustrated saying that the world is crap!

A lot of problems I had in the past were fixed by starting with a simple "top" investigation.

Best,

Marcus
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: vshaulsk on November 03, 2011, 10:00:31 pm
I am not saying it is X or Y and I perfectly understand that the best way to figure it out is during the situation ....look at "top" or have monitoring software on another machine.

I will post my "top" output late this evening.  All I am saying is that for the last year and a half I have never had this happen.  Maybe it will never happen again.... I don't know.  When I look at "top" I don't see currently any system taking up either a lot of ram or using a lot of cpu process by itself..... maybe combined all the process take up a lot of ram (not sure have not added it up).  The only thing I notice currently (or always about my system) is that I never have much free ram and my cache is approaching 4 gigs.  I would think with 8 gigs of available ram for the host OS I would several gigs left free.

Maybe this is not an issue because even if my system would start using swap .... I don't think this would have made my CPU's start using 100%...... but the RAM thing is just a question based off the only strange thing I see about my system (at least when compared to my windows experience)
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Sam Graf on November 03, 2011, 10:10:15 pm
@Marcus: Top isn't going to tell us where the free memory is going. I agree that it would help solve a high load problem if the machine can be viewed while the problem is occurring.

As for this "memory leak"-- I agree it probably isn't a true memory leak. More likely it's event-driven. Nevertheless, something does happen to free RAM that seems unpleasant if nothing else.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: christian on November 03, 2011, 10:36:38 pm
No I never really knew what happened and never had time to investigate. I noticed this, not while I was looking at log but when Zentyal log modules was started. Once stopped, everything was OK. I suspect something with Postgres DB unavailability but this is only suspicion.

Be cautious with memory usage on Unix/Linux systems. What really matters is not that much memory usage but swap, if any. Depending on the OS, it could allocate all the available memory just because it's there and available for processes to run and load data and if there is no specific request claiming for additional memory, there is no memory released. This is not an issue. You start to face memory shortage however when there is no enough memory and some memory parts are written on disk.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Sam Graf on November 03, 2011, 10:52:03 pm
That's the unpleasant part. The system "problem" as I'm describing it involves a modest amount of swap space usage--296K in the case of the above system. So seemingly regardless of the amount of installed RAM, at some point affected systems "bottom out" and end up maintaining use of a small amount of swap.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: vshaulsk on November 04, 2011, 02:46:59 am
This is what top is showing me.  My system is currently using swap for some reason... even though when you look at all the things running it is not 16 gigs worth of stuff.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: vshaulsk on November 04, 2011, 01:15:44 pm
I restarted the system ..... and within a half hour it was once again using all of my system ram according to "top".

I have not been able to replicate the highload issue, but the ram usage comes back every time. 

So far once I restarted I have not had the system start using swap, but it basically fills up the cache until I only have about 200mb of system ram left.  To me this seems like a problem..... but maybe I am wrong.

Why would the system cache just keep building up.... and why would it not reset later???
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: christian on November 04, 2011, 01:54:09 pm
I don't understand why this must be an issue.
Linux memory usage is different from Windows. As a result Linux tries to cache as much as possible.
Look at the "cached" figure in picture you posted.

Again, you will start having problems if system starts to swap. This could be something to look at closely (is Zentyal showing such report in monitoring? I don't remember and can't test right now).
If there is no swapping, there is no memory issue.

in case it helps:
http://tldp.org/LDP/tlk/mm/memory.html (http://tldp.org/LDP/tlk/mm/memory.html)

Some additional links:
http://linux-mm.org/LinuxMM (http://linux-mm.org/LinuxMM)
http://www.linux-tutorial.info/modules.php?name=MContent&pageid=260 (http://www.linux-tutorial.info/modules.php?name=MContent&pageid=260)
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: vshaulsk on November 04, 2011, 02:17:29 pm
Christian, Marcus, Sam

I have been doing some reading ..... I have to admit I did not know that linux cached ram in the manner that it does.  Currently the system is showing about 10 gigs of cache and about 5mb of swap.  However the system is very responsive and seems to be functioning.  So I guess I will drop the ram issue.

The high load I will keep looking to see if it happens .... perhaps next time I will be able to figure out what process caused the issue.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: christian on November 04, 2011, 02:31:18 pm
cool  8)

For those still having some doubt, this is an interesting link:
http://www.linuxatemyram.com/ (http://www.linuxatemyram.com/)
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Sam Graf on November 04, 2011, 02:34:43 pm
It would help, though, even just as an educational exercise, if someone had said up front, "It is perfectly normal for a Linux system to maintain a small amount of swap usage. This is only a problem when swap sizes are large and/or growing."

"How much physical memory do I need in my Zentyal server?" is a completely fair question that is complicated a little by the behavior we see. If we can't answer that question by simply watching how RAM is used, we wonder about how best to answer it. Educators call those teachable moments, and whether there is a problem or not (and I've already said I don't think we have a real problem), it would be nice to have a simple explanation of what we actually are seeing--which has nothing to do with Windows, as far as I can tell. Maybe then we would have a clear idea of how to know if physical memory needs increasing or not.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: vshaulsk on November 04, 2011, 02:38:22 pm
Yes I definitely have learned a lot in the last 48 hours. 
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Sam Graf on November 04, 2011, 02:43:58 pm
A problem is inconsistent information. Take this paragraph, for example:

Quote
No, disk caching only borrows the ram that applications don't currently want. It will not use swap. If applications want more memory, they just take it back from the disk cache. They will not start swapping.

Yet our systems do use swap. Yet it isn't a problem. And people wonder why we get a little confused ... ::)
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: christian on November 04, 2011, 02:57:28 pm
What really matters is the swap rate more than swap size itself (although one may assume that swap size will increase in case requirement for free memory is higher than available memory).

vmstat and free commands will help a lot here to check  swap in and out and available memory.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Marcus on November 06, 2011, 08:31:56 pm
Hello,

By default Ubuntu set the swappiness level to 60.  This means that you'll start using your swap when ram reaches/goes over 40%.

In your case, you are always using over 40% of ram (cache doesn't count).

So, I would recommend you to lower your swappiness level (10 would be my recommendation in your case).


This is how you can do it;

Check swapinness level:
Code: [Select]
cat /proc/sys/vm/swappiness
Change swappiness level on the fly (going from 60 to 10):
Code: [Select]
sysctl vm.swappiness=10**This will revert back to it's initial value after a server reboot**

Make change permanent:
1) Edit /etc/sysctl.conf
Code: [Select]
nano /etc/sysctl.conf
2) Search for vm.swappiness and change its value as desired.
**If vm.swappiness does not exist, add it to the end of the file like so:
Code: [Select]
vm.swappiness=10
Tweak tips;
*Try to not go over 75% of real ram usage

*Clearing swap (be careful to have enough ram free)
Code: [Select]
swapoff -a && swapon -a
*You may use a USB key/Flash to IDE adaptor(with a CF)/Different Hard Drive for your swap drive instead of using the same hard drive that your system use.  This will avoid your regular hard drives to slow down.

Off topic but still related
*It can be use on Ubuntu desktop in order to speed it up (actually it will just prevent to use swap so your desktop will be "more reactive" after weeks of operation without a single reboot).

Best,

Marcus
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: Sam Graf on November 06, 2011, 10:00:21 pm
Good information. Thank you, Marcus.
Title: Re: High-Load Diagnosis help-- zentyal 2.2
Post by: sspeed on May 27, 2016, 08:20:56 pm
I realize this is another old thread, but again searching vs just making a post.  I've found if I leave a browser open to the web console, my loads will vary from 3 to 8 on an xi3 x5a 1.8GHz dual-core system, the main culprit being apt-check spawning every two seconds to look for updates.  There are a couple of other forum posts on this.