Zentyal Forum, Linux Small Business Server

Zentyal Server => Installation and Upgrades => Topic started by: consul on February 04, 2016, 03:37:06 pm

Title: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: consul on February 04, 2016, 03:37:06 pm
Hello!
After updating my Zentyal 4.2 to latest version, it began to appear this error on consolle:

BUG: soft lockup - CPU #1 stuck for 23s!

error that is multiplied up to completely block the server and only a hardware reset work to restore functionality of server  :(

Has anyone had any experience in this?

Thank you!
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: petteri.jekunen on February 05, 2016, 05:04:07 am
Yes, the same in our environment. We are running Zentyal in a Proxmox VM.
Regards,
-Petteri
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: consul on February 09, 2016, 03:54:40 pm
After install the latest updates for Samba, it seems that the problem is solved...
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: zerolife on February 10, 2016, 09:26:33 pm
Thank you.
Have been experiencing the same issue.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: matrizze on February 11, 2016, 09:40:02 pm
@consul
can you short describe, how to install latest updates for samba?

Thx

Edit: Maybe as usual: apt-get install zentyal-samba?
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: consul on February 12, 2016, 08:46:11 am
I installed them directly via the web console Zentyal.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: consul on February 12, 2016, 11:14:48 am
I must recant...  :-[
Today occurred the same error...  >:(
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: zerolife on February 16, 2016, 04:02:06 pm
Same here... Issue is not resolved with the latest updates.  :(

Has anyone made any headway regarding this?
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: BerT666 on February 17, 2016, 01:50:46 pm
Howdy,

maybe dmesg could give you a hint, or the syslog (both under /var/log)...
Does this happen directly after booting, or after ??? hours?
Can you see anything strange with top / htop?

BTW is it Hardware (what kind of) or VM (what kind of Hypervisor)?

My Zentyal is running as a xenserver VM and I did not get anything like this...
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: segelfreak on February 17, 2016, 08:22:40 pm
Hi,

I have the same issue!

I've set up a 4.2 branch server on a Athlon X2 HP machine for a non-profit refugee support project. We want to provide about 10 workplaces to write their CV's, etc., all things they cannot do without a proper text writing software.

I'am setting up (Mint 17.2) clients to connect via samba4 ads, mount home drives, etc. and after a few sleepless nights, it seems to work well somehow :-)

Now, what I'm seeing is that the server is getting soft lockups every now and then, however, at some point those are getting many and finally the machine will get stuck. Before, I used an Lenovo/IBM Core2 machine with a separate installation, but I had some annoying hick-ups on that unit, so I thought I should change and made a new install on this HP box, but still seem to have the same problems.
So I started to look at syslog and found massively soft lockup entries.


This is from syslog at the first incident during that day:
Code: [Select]
Feb 14 14:00:01 zentyal CRON[23986]: (clamav) CMD (/usr/bin/freshclam --quiet)
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [smbd:23984]
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] Modules linked in: xt_mark xt_connmark iptable_mangle 8021q garp mrp stp llc quota_v2 quota_tree ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_tcpudp xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack amdkfd amd_iommu_v2 snd_hda_codec_analog snd_hda_codec_generic radeon hp_wmi snd_hda_intel sparse_keymap ppdev snd_hda_controller snd_hda_codec ttm snd_hwdep drm_kms_helper snd_pcm snd_timer drm kvm edac_core snd i2c_algo_bit soundcore shpchp serio_raw k8temp edac_mce_amd 8250_fintek wmi i2c_piix4 tpm_infineon parport_pc mac_hid lp parport uas usb_storage hid_generic usbhid hid psmouse 3c59x mii floppy tg3 ahci libahci ptp pps_core
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] CPU: 0 PID: 23984 Comm: smbd Not tainted 3.19.0-49-generic #55~14.04.1-Ubuntu
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] Hardware name: Hewlett-Packard HP Compaq dc5850 Microtower/3029h, BIOS 786F6 v01.09 04/09/2008
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] task: ffff8800693a93a0 ti: ffff88006bd30000 task.ti: ffff88006bd30000
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] RIP: 0010:[<ffffffff817b77f5>]  [<ffffffff817b77f5>] _raw_spin_lock+0x35/0x60
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] RSP: 0018:ffff88006bd33e20  EFLAGS: 00000206
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] RAX: 0000000000003db1 RBX: ffff88000897d0c0 RCX: 00000000000001d2
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] RDX: 00000000000001d4 RSI: 00000000000001d2 RDI: ffff8800691b6120
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] RBP: ffff88006bd33e48 R08: 00000000000001d4 R09: ffff88006bd33c14
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] R10: ffff88006bd33ee2 R11: 0000000000000005 R12: ffff88002031a870
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] R13: 00000000000000a2 R14: 0000000400000001 R15: ffff88002031a870
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] FS:  00007fef5ef10780(0000) GS:ffff88006fc00000(0000) knlGS:0000000000000000
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] CR2: 00007f51e9e72000 CR3: 0000000014ac3000 CR4: 00000000000007f0
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] Stack:
Feb 14 14:00:08 zentyal kernel: [ 7928.088009]  ffffffff817484a0 ffff8800691b4000 ffff8800691b5e00 ffff8800200ff480
Feb 14 14:00:08 zentyal kernel: [ 7928.088009]  ffff8800691b4000 ffff88006bd33ea8 ffffffff8174b643 ffff88006bd33e88
Feb 14 14:00:08 zentyal kernel: [ 7928.088009]  ffffffff81cda080 ffff88006bd33e78 00000028200ff480 000000000000006e
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] Call Trace:
Feb 14 14:00:08 zentyal kernel: [ 7928.088009]  [<ffffffff817484a0>] ? unix_state_double_lock+0x60/0x70
Feb 14 14:00:08 zentyal kernel: [ 7928.088009]  [<ffffffff8174b643>] unix_dgram_connect+0x93/0x250
Feb 14 14:00:08 zentyal kernel: [ 7928.088009]  [<ffffffff8168f367>] SYSC_connect+0xe7/0x120
Feb 14 14:00:08 zentyal kernel: [ 7928.088009]  [<ffffffff8169054e>] SyS_connect+0xe/0x10
Feb 14 14:00:08 zentyal kernel: [ 7928.088009]  [<ffffffff817b7c0d>] system_call_fastpath+0x16/0x1b
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] Code: f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 01 c3 89 d1 0f b7 f2 b8 00 80 00 00 eb 0a 0f 1f 00 f3 90 83 e8 01 74 20 0f b7 17 41 89 d0 <41> 31 c8 41 81 e0 fe ff 00 00 75 e7 55 0f b7 f2 48 89 e5 e8 6b
Feb 14 14:00:30 zentyal dhcpd: DHCPDISCOVER from 00:1e:0b:80:82:21 (mint) via eth1

It then continues, until the machine finally completely stops:
Code: [Select]
Feb 15 01:59:58 zentyal kernel: [ 8836.084003] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [smbd:6522]
Feb 15 01:59:58 zentyal kernel: [ 8836.084005] Modules linked in: quota_v2 quota_tree ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_tcpudp xt_conntrack iptable_nat nf_nat_ipv4 iptable_filter nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp xt_mark nf_conntrack_ipv4 nf_defrag_ipv4 xt_connmark nf_conntrack iptable_mangle ip_tables x_tables snd_hda_codec_analog snd_hda_codec_generic amdkfd snd_hda_intel hp_wmi sparse_keymap snd_hda_controller amd_iommu_v2 ppdev radeon snd_hda_codec snd_hwdep snd_pcm snd_timer ttm kvm drm_kms_helper drm serio_raw snd edac_core k8temp soundcore edac_mce_amd i2c_algo_bit parport_pc i2c_piix4 wmi shpchp 8250_fintek tpm_infineon mac_hid lp parport uas usb_storage hid_generic usbhid hid psmouse tg3 3c59x ptp mii pps_core floppy ahci libahci
Feb 15 01:59:58 zentyal kernel: [ 8836.084005] CPU: 0 PID: 6522 Comm: smbd Tainted: G             L 3.19.0-49-generic #55~14.04.1-Ubuntu
Feb 15 01:59:58 zentyal kernel: [ 8836.084005] Hardware name: Hewlett-Packard HP Compaq dc5850 Microtower/3029h, BIOS 786F6 v01.09 04/09/2008
Feb 15 01:59:58 zentyal kernel: [ 8836.084005] task: ffff880069554e80 ti: ffff88006b930000 task.tFeb 17 19:24:18 zentyal rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="450" x-info="http://www.rsyslog.com"] start

With this, I'm afraid it makes the system pretty much useless and all but reliable.
Would you have any suggestions how to solve?
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: segelfreak on February 17, 2016, 10:09:25 pm
just some more form the syslog

Code: [Select]
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] INFO: rcu_sched self-detected stall on CPU { 1}  (t=15000 jiffies g=68817 c=68816 q=0)
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] Task dump for CPU 1:
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] smbd            R  running task        0 18719   5097 0x00000008
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  ffffffff81c56000 ffff88006fc83d78 ffffffff810a0276 0000000000000001
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  ffffffff81c56000 ffff88006fc83d98 ffffffff810a386d 0000000000000087
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  0000000000000002 ffff88006fc83dc8 ffffffff810d4100 ffff88006fc94bc0
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] Call Trace:
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  <IRQ>  [<ffffffff810a0276>] sched_show_task+0xb6/0x130
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff810a386d>] dump_cpu_task+0x3d/0x50
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff810d4100>] rcu_dump_cpu_stacks+0x90/0xd0
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff810d7fbc>] rcu_check_callbacks+0x42c/0x670
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff810a48a1>] ? account_process_tick+0x61/0x180
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff810dcef9>] update_process_times+0x39/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff810ec405>] tick_sched_handle.isra.16+0x25/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff810ec484>] tick_sched_timer+0x44/0x80
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff810ddbb7>] __run_hrtimer+0x77/0x1d0
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff810ec440>] ? tick_sched_handle.isra.16+0x60/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff810ddf97>] hrtimer_interrupt+0xe7/0x220
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff8104abc9>] local_apic_timer_interrupt+0x39/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff817bac85>] smp_apic_timer_interrupt+0x45/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff817b8cbd>] apic_timer_interrupt+0x6d/0x80
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  <EOI>  [<ffffffff817b77ea>] ? _raw_spin_lock+0x2a/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff817484a0>] ? unix_state_double_lock+0x60/0x70
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff8174b643>] unix_dgram_connect+0x93/0x250
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff8168f367>] SYSC_connect+0xe7/0x120
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff8169054e>] SyS_connect+0xe/0x10
Feb 17 21:59:41 zentyal kernel: [ 9345.516006]  [<ffffffff817b7c0d>] system_call_fastpath+0x16/0x1b

It also appears that you can't kill the hanging process, not even with a forced (-9) kill. It's an smbd task, stalling it all. :-(
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: segelfreak on February 17, 2016, 10:17:35 pm
Just found this: http://ubuntuforums.org/showthread.php?t=2205211&p=12996968#post12996968

Refers to a malfunctioning power supply. I will try to use another unit as this really rings a bell: I actually used the same power supply in both machines I had tried independently so far. It's indeed the only link between the two installations.
Will keep you updated.

EDIT: OK, it's not the power supply. Changed it and the error still comes up.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: matrizze on February 18, 2016, 02:38:03 pm
I have no proof for it, but after updating with apt-get install zentyal-samba and deleting other Debian VM in my ESXI Hypersphere 5.5 this failure didn't come till last post.

The other Debian machine provisioned by 2 CPU with 1 Cores. After that i added only machines with 1 CPU and 2 Cores and had not this failure again??!

But like i said, i have no proof for it.

M.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: BerT666 on February 18, 2016, 02:57:38 pm
Howdy,

just to be clear about this: do you have the vmWare Tools installed?

Regards

Thomas
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: segelfreak on February 18, 2016, 08:44:45 pm
Howdy,

just to be clear about this: do you have the vmWare Tools installed?

Regards

Thomas

Not sure who you mean, but just in case: I didn't use any VM, so I do not expect this issue to have its root there. If I had the vmWare tools installed? Not on purpose, however, if they have been installed during the standard setup, its possible.
I can't check anymore, since I'm making a fresh install and then will not (!) make any upgrades. Let's see if this makes the lockups stop.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: segelfreak on February 18, 2016, 11:13:22 pm
Ok folks..

So, after having made a new installation I recognized that zentyal had updated its packages already automatically.
At first I was a bit disappointed, but then I decided to move forward, step by step.
So I installed the various updates, except for the new kernel. In my case, it would be generic kernel 3.19.0.49.34.

Up to now, no lockups... cross fingers!
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: peptoniET on February 19, 2016, 12:57:24 pm
Hi everyone.

Experienced the same problem on two different servers.  I had 2 lockups on the first one.  After some research, I decided to downgrade the kernel from 3.19.0-49-generic to 3.19.0-47-generic.  So far, no more lockups.

Today I experienced same behaviour on another server.  Checked kernel version, and it was 3.19.0-49-generic. Just downgraded too this one to 3.19.0-47-generic.

I'll keep you informed about results.  First server has not lockup since downgrade.

Both Zentyal 4.2.2 up to date.

How to downgrade:

Code: [Select]
sudo apt-get purge linux-image-3.19.0-49-generic
sudo update-grub

then reboot.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: matrizze on February 19, 2016, 01:54:46 pm
@BerT666

I haven't installed the VMWare Tools on all VMs.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: segelfreak on February 19, 2016, 03:30:11 pm
Hi everyone.

Experienced the same problem on two different servers.  I had 2 lockups on the first one.  After some research, I decided to downgrade the kernel from 3.19.0-49-generic to 3.19.0-47-generic.  So far, no more lockups.

Today I experienced same behaviour on another server.  Checked kernel version, and it was 3.19.0-49-generic. Just downgraded too this one to 3.19.0-47-generic.

I'll keep you informed about results.  First server has not lockup since downgrade.

Both Zentyal 4.2.2 up to date.

How to downgrade:

Code: [Select]
sudo apt-get purge linux-image-3.19.0-49-generic
sudo update-grub

then reboot.

I may just add that one should ensure that the previous kernel is still "available". Auto-remove function of apt might have deleted it, no?
And finally, you need to put upgrade offers for the new kernel on hold...
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: peptoniET on February 19, 2016, 06:45:38 pm
I may just add that one should ensure that the previous kernel is still "available". Auto-remove function of apt might have deleted it, no?
And finally, you need to put upgrade offers for the new kernel on hold...

Default config saves 2 to 3 kernels:

/etc/kernel/postinst.d/apt-auto-removal says:

# Mark as not-for-autoremoval those kernel packages that are:
#  - the currently booted version
#  - the kernel version we've been called for
#  - the latest kernel version (determined using rules copied from the grub
#    package for deciding which kernel to boot)
#  - the second-latest kernel version, if the booted kernel version is
#    already the latest and this script is called for that same version,
#    to ensure a fallback remains available in the event the newly-installed
#    kernel at this ABI fails to boot
# In the common case, this results in exactly two kernels saved, but it can
# result in three kernels being saved.  It's better to err on the side of
# saving too many kernels than saving too few.

So, in a default configuration, it should be safe. But you are right, one should check before deleting the kernel.  Anyway, i'm curious about what would happen if we try to remove the last kernel...

Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: segelfreak on February 21, 2016, 11:53:32 am
  Anyway, i'm curious about what would happen if we try to remove the last kernel...

 ;D Wanna try?
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: peptoniET on February 21, 2016, 07:00:22 pm
Nope...  ;D
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: BerT666 on February 22, 2016, 11:55:28 am
In the past, I there was a problem with the kernel regarding to a firewall script...

There it was no problem to remove the newest kernel...

The only problem is, that you have to keep it in mind and do a bit more testing of package-updates ;-)

@matrizze: I heared of several POSSIBLE issues, when the vmWare Tools are not installed. That was the background of my question ;-)

Regards

Thomas


Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: hotsummer55 on February 24, 2016, 10:17:30 am
I have the same issue
I have a kvm server host with zentyal running as guest with 8 cpus
It was running on 4.1 until friday 19 feb 2016 no problems for about 10 months.
After upgrade to 4.2  it is running on linux-image-3.19.0-49-generic kernel.
I am having tainted g in smbd and one of the cpu's seem to be stuck at 100% when looking from htop.
It seem to eventually run out of memory and then hangs.This has happened once to me.
Today 23/02/2016 in vm i have reduced cpus to 4 to see if that helps and have noticed there is a new kernel update available.
I can't go back as I haven't got a earlier kernel .(Just upgraded)
If problem reappears today I will run another apt-get dist-upgrade and get latest kernel to see if that fixes problem.
I will report back
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: peptoniET on February 24, 2016, 10:26:31 am
Hi everyone.

Experienced the same problem on two different servers.  I had 2 lockups on the first one.  After some research, I decided to downgrade the kernel from 3.19.0-49-generic to 3.19.0-47-generic.  So far, no more lockups.

Today I experienced same behaviour on another server.  Checked kernel version, and it was 3.19.0-49-generic. Just downgraded too this one to 3.19.0-47-generic.

I'll keep you informed about results.  First server has not lockup since downgrade.

Both Zentyal 4.2.2 up to date.

How to downgrade:

Code: [Select]
sudo apt-get purge linux-image-3.19.0-49-generic
sudo update-grub

then reboot.

Second downgraded kernel server has been stable.  No more lockups after kernel downgrade.  Same with first server.  It seems that we have pinpointed the problem.

So what is happening...?  a Kernel bug...? Samba bug...?
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on February 25, 2016, 10:52:47 am
Hi!

Same issues here!
After trying to update samba to the latest (which didn't exactly went as expected) we're experiencing some processor freeze which leads to a complete unresponsive machine.
Which only leads to reboot.

Lets break down the steps to replicate the issue:

Code: [Select]
samba-common-bin:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
samba-common:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
samba-dsdb-modules:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
samba-libs:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
samba-vfs-modules:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
samba:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
smbclient:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1)

(http://s23.postimg.org/pj7b0jyx7/Selection_011.png)

Able to reboot, needed to work the issue around with package reconfig (can't rem right now all the exact actions). One of the issues i've encountered was a broken sql table and samba service wasn't working at all.
I needed to rebuild the samba_access table using the suggestions from this url:  http://stackoverflow.com/questions/8843776/mysql-table-is-marked-as-crashed-and-last-automatic-repair-failed

Code: [Select]
cd /var/lib/mysql/zentyal/
myisamchk -r -v -f samba_access.MYD   <-- If I do remember correctly won't work on this one since my issues were on index
myisamchk -r -v -f samba_access.MYI
sudo dpkg --configure -a
sudo reboot

After that at least package were installed correctly and machine were able to work

The other two machines were less tricky, one went fine on the first shot and the second were able to be package-reconfigured via

Code: [Select]
dpkg --configure -aMachines froze 4 times in 5 days (never happened). The sixth day I was actively monitoring and top showed this up:
[/list]

Code: [Select]
top - 11:34:30 up 17:58,  2 users,  load average: 0.82, 0.34, 0.15
Tasks: 444 total,   2 running, 442 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.0 us,  1.3 sy,  0.0 ni, 96.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  16185268 total, 15717188 used,   468080 free,   374772 buffers
KiB Swap: 16544764 total,        0 used, 16544764 free. 13313988 cached Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                 
 19904 ebox      20   0  326012  14052  11768 R  99.9  0.1   1:13.49 net                         
 

One processor was stucked at 100% and killing the net process prevented a machine freeze.

Now we're running fine since monday (3 days, no issues 'till now)...dunno if I'm going to downgrade (as peptoniET) or upgrade since this updates are available

Code: [Select]
linux-generic Complete Generic Linux kernel and headers 3.13.0.79.85
linux-headers-generic Generic Linux kernel headers 3.13.0.79.85
linux-image-generic Generic Linux kernel image 3.13.0.79.85
linux-image-generic-lts-vivid Generic Linux kernel image 3.19.0.51.36
linux-source Linux kernel source with Ubuntu patches 3.13.0.79.85
linux-source-3.13.0 Linux kernel source for version 3.13.0 with Ubuntu patches 3.13.0-79.123

Suggestions?

Thx all, hope everything is clear.

I'm here for questions.

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: peptoniET on February 25, 2016, 11:07:32 am
I must add:  I have a third Zentyal 4.2.2 machine, with 3.19.0-49-generic kernel working without any problem since 12 days.  BUT THIS MACHINE HAS NO SAMBA SHARES.  The module is enabled, but no shares have been made (only working as Domain Controller + VirtualBox VM server).
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: BerT666 on February 25, 2016, 01:45:28 pm
Howdy,

so it seems it is another occurence of this bug / failure:
https://forum.zentyal.org/index.php/topic,26954.0.html (https://forum.zentyal.org/index.php/topic,26954.0.html)

There was also a problem with kernel / net scripts...

So there are two solutions for us "normal admins":

- update to latest kernel and hope this is gone
- revert to an older kernel where all is OK...

I had this only one or two times on my VM (I think it was related to a big data transfer (about 1,5TB)), so I am not sure if this is solved with the actual kernel...

Regards

Thomas
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: hotsummer55 on February 25, 2016, 05:20:31 pm
Follow on from my previous post
problem
server eventually runs out of memory and you can't do anything with it and have to switch off.
My server is  doing a lot of smbd ,samba traffic .we backup by mounting samba shares and rsyncing to storage
On a copy of the server that has no samba traffic error don't seem to appear .
Many errors in syslog are generated always with smbd process
 ++++++++++++++++++++++++++++++++
 kernel: [40908.028001] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [smbd:3049]
Feb 24 09:34:44 server2 kernel: [40908.028001] Modules linked in: xt_mac xt_mark nf_conntrack_ipv4 nf_defrag_ipv4 xt_connmark nf_conntrack iptable_mangle ip_tables x_tables quota_v2 quota_tree nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache iosf_mbi kvm_intel kvm crct10dif_pclmul crc32_pclmul dm_crypt aesni_intel snd_hda_codec_generic aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_intel snd_hda_controller serio_raw snd_hda_codec snd_hwdep virtio_rng snd_pcm snd_timer i2c_piix4 snd soundcore pvpanic 8250_fintek mac_hid parport_pc ppdev lp parport psmouse pata_acpi floppy
Feb 24 09:34:44 server2 kernel: [40908.028001] CPU: 1 PID: 3049 Comm: smbd Not tainted 3.19.0-49-generic #55~14.04.1-Ubuntu
Feb 24 09:34:44 server2 kernel: [40908.028001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Feb 24 09:34:44 server2 kernel: [40908.028001] task: ffff8800b9f9ce80 ti: ffff88019951c000 task.ti: ffff88019951c000
Feb 24 09:34:44 server2 kernel: [40908.028001] RIP: 0010:[<ffffffff817b77ea>]  [<ffffffff817b77ea>] _raw_spin_lock+0x2a/0x60
Feb 24 09:34:44 server2 kernel: [40908.028001] RSP: 0018:ffff88019951fe20  EFLAGS: 00000206
++++++++++++++++++++++++++++++
tainted should be zero and is not
cat /proc/sys/kernel/tainted          displays the tainted value
and htop you have a cpu stuck at 100%
and free -m displays memory is all used
+++++++++++++++++++++++++++++
This is on a server that was upgraded from zentyal 4.1 and after upgraded was running on kernel
linux-image-3.19.0-49-generic
prior to this is was running on 4.1 for about 10 months no problem
+++++++++++++++
I decided to go back to linux-image-3.19.0-47-generic kernel but as i had just upgraded it was not installed
so
run
sudo apt-cache madison linux-image-3.19.0-4
or
sudo apt-cache madison linux-image-3.19.0-5
To display available kernels then i ran
sudo apt-get install linux-image-3.19.0-47-generic        to install required kernel if not already available
As I had no other 3.19 kernel installed as this was a upgrade from 4.1.

Then I wanted to force grub to only load linux-image-3.19.0-47-generic kernel
so edited  /etc/default/grub
and replaced GRUB_DEFAULT=0
with must be exactly correct.
GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 3.19.0-47-generic"
then run
sudo update_grub
then reboot
Today 25/02/2016 the server is running with no problem and definitely would have problem if this did not  solved this issue



 
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: jwilliams1976 on February 26, 2016, 04:02:08 pm
I was having this same issue and thanks to your help here have rolled back to the 3.19.0-47 kernel and everything seems to be normal again. Has anyone tested the 3.19.0.51.36 vivid kernel yet? I'm on a production server and can't really test it out.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: Rabgye on February 27, 2016, 08:19:38 am
I seem to have the  Zentyal email Server Locked up after updating to the kernel linux-image-3.19.0-51.
Had to revert back to 3.19.0-49   and seems to work. I am not using the SAMBA features in my Mail Server..
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on February 27, 2016, 11:43:45 am
Ok then looks like that everything after kernel 3.19.0-47-generic is giving issues.
However i'm still on 3.19.0-49-generic and since monday no accident occurred...

...let see...I'll keep You all updated

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pumpir on February 27, 2016, 04:12:59 pm
The same issue with Zentyal 4.2.2 linux-image-3.19.0-49-generic and linux-image-3.19.0-51-generic, usually it happens within 1-2 days, today I changed grub to use linux-image-3.19.0-47-generic, hope it helps. Thanks for tip.
 I had another issue, not sure if also connected with this, it is described here:
https://forum.zentyal.org/index.php/topic,27402.msg100165.html#msg100165
 
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: itiab on February 29, 2016, 09:21:22 am
I've also had that issue. Brand new Zentyal install.

See post here: https://forum.zentyal.org/index.php/topic,27428.0.html

Samba thread caused the issue.

Physical machine, i5 Intel processor , 4 cores, 8GB ram, three people in it, was moving a few fils from old server to this one. Total lock up and had to pull the power plug. Top showed a work load of over 128.0

Can provide syslog if required.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 01, 2016, 05:25:55 pm
Ok guys,

happened again on one machine.   >:( :( :'( ???

htop and top never shown stuck processes while 5 cpu (over 32) where maxxed out to 100%!
Memory looked ok, not fulled.

Needed to cold reboot since wasn't able to reboot normally.  >:(

I've seen there's out the 3.13.0.79.85...SAFE 2 UPGRADE or should I downgrade?

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 02, 2016, 02:58:29 pm
OOOOK, new symptom:

FIREWALL (and possibly DHCP and DNS) not working properly.

Firewall (& dhcp?)
Got packet drop with WINDOW=5840 RES=0x00 SYN URGP=0 MARK=0x1 without being able to log the drop via webadmin, only via syslog.
The weird thing is that the POS system is inside a newtork object with a static IP assigned. The only solution was changing the IP address (incrementing by 1, don't thing the value is to be considered)!

Now I'm reconnecting the issue I had the past 3 weeks with the voip system...which could really be correlated (same behavior).

DNS

Not sure, unable to confirm, but it wasn't serving the militarized zone

Has this happened to anyone?

Thx

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: spott on March 02, 2016, 05:53:52 pm
Hi

Is there any news about this issue? I resently updated my Zentyal and I have same problems. Today I found this topic and I downgraded my kernel to 3.19.0-43 I hope it helps. But has someone reported this bug? Can we watch it and look, when it is fixed?
I have also Proxmox where inside running Zentyal Samba server.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 03, 2016, 10:18:53 am
Hi

Is there any news about this issue? I resently updated my Zentyal and I have same problems. Today I found this topic and I downgraded my kernel to 3.19.0-43 I hope it helps. But has someone reported this bug? Can we watch it and look, when it is fixed?
I have also Proxmox where inside running Zentyal Samba server.

Hi spott,

when You updated which kernel were installed?

thx

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: spott on March 03, 2016, 11:45:13 am
3.19.0-47
But with that was in first day big crash - so I updated again and get 51. Now I installed 43 kernel - right now its fine - but with 51 kernel it was also two days fine and then was one cores fully loaded.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 03, 2016, 12:10:42 pm
3.19.0-47
But with that was in first day big crash - so I updated again and get 51. Now I installed 43 kernel - right now its fine - but with 51 kernel it was also two days fine and then was one cores fully loaded.

Uhm...strange...the other reported that kernel 3.19.0-47 was fine...damn.
51 is also buggy but there's out the 3.13.0.79.85...dunno what to do...

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: spott on March 03, 2016, 02:37:40 pm
When You have it installed - You can try to load that older kernel. But I think its from Zentyal 4.1 and Zentyal 4.2 brings 3.19 kernels.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: wesley.sena on March 03, 2016, 06:32:00 pm
Hi,

dear friends, I am also going through the same situation with Zentyal 4.2 installed on an HP ML110 host, in this case there is no problem of the link with virtualized environment.

Just a CPU crash, leaving the slow service and even compromising all other Zentyal services, to the point of halting the OS and have to restart.

Informardo was that just reinstalling the Samba service fixes the problem, or has otherwise?
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: spott on March 03, 2016, 06:41:10 pm
No - we using older kernel
Samba update doesn't fix it right now.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: wesley.sena on March 03, 2016, 07:01:07 pm
The Zentyal 4.0 or 4.1 has solved the problem ?
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: spott on March 03, 2016, 08:05:24 pm
Problem starts after upgrading to 4.2 and we using some older kernel in 4.2 branch. I am testing right now with 3.19.0.43 kernel.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 11, 2016, 07:36:25 pm
Ok,

after one week from the update of one machine to kernel 3.19.0-79-generic I can tell that everything looks back to normal, no issues whatsoever or freeze.
Updated the other servers, will keep You up-to-date.

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: peptoniET on March 14, 2016, 11:55:47 am
Ok. Let's asume 3.19.0-79 solves the problem.

Will "apt-get upgrade" update the kernel after we did a rollback...?  Were kernel updates freezed with the rollback we did...? How to force kernel update back to normal?

Cannot find anything...

Thanks.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 14, 2016, 12:10:13 pm
Up untill now everything looks normal and fine.

@peptoniET: Does the interface tells U to update? It should.
I think anyway that  "apt-get upgrade" will just do fine

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: hotsummer55 on March 14, 2016, 03:27:38 pm
Quote
Ok. Let's asume 3.19.0-79 solves the problem.

Will "apt-get upgrade" update the kernel after we did a rollback...?  Were kernel updates freezed with the rollback we did...? How to force kernel update back to normal?

Cannot find anything...

Thanks.
not sure where you got  3.19.0-79  kernel i though they where only up to 3.19.0-51
apt-get upgrade will not upgrade kernels.
You need apt-get dist-upgrade  to upgrade kernel.
If you edited /etc/default/grub as in my previous post .that will keep booting by default the kernel you set there.
If you purged the faulty kernel then yes if you dist-upgrade then it will boot with newer kernel that may be installed
If you hold down the shift key when booting you should get the grub boot screen and be able to choose different kernels


Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 14, 2016, 03:43:32 pm
OK Guys,

one machine again presented the issue.
I think I'll need to roll back. SIGH

I'll keep U up-to-date...

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: hotsummer55 on March 14, 2016, 03:48:52 pm
Quote
OK Guys,

one machine again presented the issue.
I think I'll need to roll back. SIGH

I'll keep U up-to-date...
I have been on the    Linux 3.19.0-47-generic    since feb 25 2016 and definitely no problem.
My server would stop responding after 2 days.I use samba a lot but no openchange.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: spott on March 14, 2016, 04:40:24 pm
I am running now 12 days already GNU/Linux 3.19.0-43-generic x86_64 kernel without that issue.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 14, 2016, 04:52:49 pm
I am running now 12 days already GNU/Linux 3.19.0-43-generic x86_64 kernel without that issue.

Ok I'll roll back too.

Are the devels doing something or....?
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: nasenmann72 on March 17, 2016, 03:33:27 pm
Hi,

I had also the same issue in the last days on a Zentyal VM in Proxmox VE. Now I downgraded to 3.19.0-43 and we'll see what happens.
I would also like to know if any of the zentyal guy has something to say about this issue.

Regards
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 18, 2016, 11:35:40 am
Develoer edition's users are feeling a little bit left alone here...

This is no graphic/minor issue, this is a HUGE, kernel-level disaster...

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 21, 2016, 05:14:18 pm
New kernel (generic, 3.13.0.83.89) is out...any infos about it?

Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: hotsummer55 on March 21, 2016, 05:56:43 pm
Quote
New kernel (generic, 3.13.0.83.89) is out...any infos about it?

You should be on a 3.19 kernel if you are on zentyal 4.2
If I where you I would stay on the Linux 3.19.0-47-generic kernel for a few months if you want stability
There is the option to go to a 4.2 kernel that is in wily ubuntu15.10
see
https://wiki.ubuntu.com/Kernel/LTSEnablementStack

hope that helps
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 22, 2016, 08:22:59 am
Quote
New kernel (generic, 3.13.0.83.89) is out...any infos about it?

You should be on a 3.19 kernel if you are on zentyal 4.2
If I where you I would stay on the Linux 3.19.0-47-generic kernel for a few months if you want stability
There is the option to go to a 4.2 kernel that is in wily ubuntu15.10
see
https://wiki.ubuntu.com/Kernel/LTSEnablementStack

hope that helps

Thx @hotsummer98 for opening my eyes on the version numbers...I've read 3.19.0.83.89 instead of 3.13.0.83.89. A big f. mistake! Stupid me.

Right now I'm on 3.19.0-51-generic (as my previous comments on this post) and YES, I've already scheduled a "happy easter downgrade" =D
I need stability.
Still I would like to hear from devels...

Thx again.

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: phaidros on March 23, 2016, 11:00:19 am
Same here.

Upgraded to
Code: [Select]
Linux iklii 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
And getting this below alot still :/


Code: [Select]
Mar 23 10:59:16 iklii kernel: [155292.064007] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [smbd:7866]
Mar 23 10:59:16 iklii kernel: [155292.064007] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_tcpudp xt_conntrack iptable_nat nf_nat_ipv4 iptable_filter nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp xt_mac xt_mark nf_conntrack_ipv4 nf_defrag_ipv4 xt_connmark nf_conntrack iptable_mangle ip_tables x_tables quota_v2 quota_tree joydev hid_generic usbhid hid ppdev serio_raw pvpanic 8250_fintek parport_pc i2c_piix4 cirrus ttm drm_kms_helper drm syscopyarea sysfillrect sysimgblt mac_hid lp parport floppy psmouse pata_acpi
Mar 23 10:59:16 iklii kernel: [155292.064007] CPU: 1 PID: 7866 Comm: smbd Tainted: G             L 3.19.0-56-generic #62~14.04.1-Ubuntu
Mar 23 10:59:16 iklii kernel: [155292.064007] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
Mar 23 10:59:16 iklii kernel: [155292.064007] task: ffff8800adaf44b0 ti: ffff880006520000 task.ti: ffff880006520000
Mar 23 10:59:16 iklii kernel: [155292.064007] RIP: 0010:[<ffffffff817b8858>]  [<ffffffff817b8858>] _raw_spin_lock+0x28/0x60
Mar 23 10:59:16 iklii kernel: [155292.064007] RSP: 0018:ffff880006523e20  EFLAGS: 00000206
Mar 23 10:59:16 iklii kernel: [155292.064007] RAX: 00000000000066fc RBX: ffff88005c2f8240 RCX: 00000000000016a0
Mar 23 10:59:16 iklii kernel: [155292.064007] RDX: 00000000000016aa RSI: 00000000000016a0 RDI: ffff8800a6a39d60
Mar 23 10:59:16 iklii kernel: [155292.064007] RBP: ffff880006523e48 R08: 000000000000000a R09: ffff880006523c14
Mar 23 10:59:16 iklii kernel: [155292.064007] R10: ffff880006523ee2 R11: 0000000000000004 R12: ffff88011af15c88
Mar 23 10:59:16 iklii kernel: [155292.064007] R13: 0000000000000088 R14: 0000000400000001 R15: ffff88011af15c88
Mar 23 10:59:16 iklii kernel: [155292.064007] FS:  00007f9ef5ac8780(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
Mar 23 10:59:16 iklii kernel: [155292.064007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 23 10:59:16 iklii kernel: [155292.064007] CR2: 000056157be51e78 CR3: 000000001676b000 CR4: 00000000000006e0
Mar 23 10:59:16 iklii kernel: [155292.064007] Stack:
Mar 23 10:59:16 iklii kernel: [155292.064007]  ffffffff817491cc ffff8800ad8f1680 ffff880006523ec0 ffff88001244d400
Mar 23 10:59:16 iklii kernel: [155292.064007]  ffff8800ad8f1680 ffff880006523ea8 ffffffff8174b713 ffff880006523e88
Mar 23 10:59:16 iklii kernel: [155292.064007]  ffffffff81cd9fc0 ffff880006523e78 000000271244d400 000000000000006e
Mar 23 10:59:16 iklii kernel: [155292.064007] Call Trace:
Mar 23 10:59:16 iklii kernel: [155292.064007]  [<ffffffff817491cc>] ? unix_state_double_lock+0x2c/0x70
Mar 23 10:59:16 iklii kernel: [155292.064007]  [<ffffffff8174b713>] unix_dgram_connect+0x93/0x250
Mar 23 10:59:16 iklii kernel: [155292.064007]  [<ffffffff8168fde7>] SYSC_connect+0xe7/0x120
Mar 23 10:59:16 iklii kernel: [155292.064007]  [<ffffffff81690fce>] SyS_connect+0xe/0x10
Mar 23 10:59:16 iklii kernel: [155292.064007]  [<ffffffff817b8c4d>] system_call_fastpath+0x16/0x1b
Mar 23 10:59:16 iklii kernel: [155292.064007] Code: 00 00 00 0f 1f 44 00 00 b8 00 00 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 01 c3 89 d1 0f b7 f2 b8 00 80 00 00 eb 0a 0f 1f 00 <f3> 90 83 e8 01 74 20 0f b7 17 41 89 d0 41 31 c8 41 81 e0 fe ff


Can we please have a word from the devs?
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: hotsummer55 on March 23, 2016, 11:45:29 am
I have created a bug on bugtraker
https://tracker.zentyal.org/issues/4977
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 23, 2016, 01:35:35 pm
I have created a bug on bugtraker
https://tracker.zentyal.org/issues/4977

Kudos!  :D

Keep us up-2-date plz!
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: hotsummer55 on March 23, 2016, 03:23:37 pm
please vote up bug if interested.may get better responce on this.

https://tracker.zentyal.org/issues/4977
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: kinetica on March 24, 2016, 07:14:40 pm
Server DELL
Virtual environment latest Proxmox 4.1 up to date

VM:
Zentyal 4.2.2 up to date used only as SAMBA SERVER
Description: Ubuntu 14.04.4 LTS
Release: 14.04
Codename: trusty
Kernel: 3.19.0-56-generic
Linux zensamba 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


PROBLEM DESCRIPTION:
zensamba kernel: [49848.028036] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [smbd:1724]
VM Stack we need to Stop and restart


Mar 23 16:37:41 zensamba kernel: [85320.064038] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [smbd:1696]
Mar 23 16:37:41 zensamba kernel: [85320.064038] Modules linked in: iptable_nat nf_nat_ipv4 nf_nat iptable_filter xt_mac xt_mark nf_conntrack_ipv4 nf_defrag_ipv4 xt_connmark nf_conntrack iptable_mangle

ip_tables x_tables quota_v2 quota_tree cirrus ttm drm_kms_helper drm joydev syscopyarea sysfillrect sysimgblt i2c_piix4 ppdev mac_hid shpchp serio_raw 8250_fintek parport_pc lp parport hid_generic

usbhid hid floppy psmouse pata_acpi
Mar 23 16:37:41 zensamba kernel: [85320.064038] CPU: 3 PID: 1696 Comm: smbd Not tainted 3.19.0-56-generic #62~14.04.1-Ubuntu
Mar 23 16:37:41 zensamba kernel: [85320.064038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
Mar 23 16:37:41 zensamba kernel: [85320.064038] task: ffff8802337875c0 ti: ffff880219448000 task.ti: ffff880219448000
Mar 23 16:37:41 zensamba kernel: [85320.064038] RIP: 0010:[<ffffffff8105b976>]  [<ffffffff8105b976>] native_safe_halt+0x6/0x10
Mar 23 16:37:41 zensamba kernel: [85320.064038] RSP: 0018:ffff88021944bd78  EFLAGS: 00000206
Mar 23 16:37:41 zensamba kernel: [85320.064038] RAX: 000000000000003b RBX: 00000000000000b0 RCX: 0000000000000001
Mar 23 16:37:41 zensamba kernel: [85320.064038] RDX: 0000000000000000 RSI: 0000000000008550 RDI: ffff88023fff20c0
Mar 23 16:37:41 zensamba kernel: [85320.064038] RBP: ffff88021944bd78 R08: 000000000000023c R09: ffff88021944bc14
Mar 23 16:37:41 zensamba kernel: [85320.064038] R10: ffff88021944bee2 R11: 0000000000000004 R12: ffffffff811fa3eb
Mar 23 16:37:41 zensamba kernel: [85320.064038] R13: ffff88021944bd18 R14: 0000000000000006 R15: 00000000ffffff9c
Mar 23 16:37:41 zensamba kernel: [85320.064038] FS:  00007fe6cd43f780(0000) GS:ffff88023fd80000(0000) knlGS:0000000000000000
Mar 23 16:37:41 zensamba kernel: [85320.064038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 23 16:37:41 zensamba kernel: [85320.064038] CR2: 000055f937453000 CR3: 000000021944c000 CR4: 00000000000006e0
Mar 23 16:37:41 zensamba kernel: [85320.064038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 23 16:37:41 zensamba kernel: [85320.064038] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 23 16:37:41 zensamba kernel: [85320.064038] Stack:
Mar 23 16:37:41 zensamba kernel: [85320.064038]  ffff88021944bdc8 ffffffff8105b47b 0000000000000088 0000855035a1fc48
Mar 23 16:37:41 zensamba kernel: [85320.064038]  ffff88021944be48 ffff8800a9c1f800 ffff88021944bec0 ffff88008c0ca580
Mar 23 16:37:41 zensamba kernel: [85320.064038]  0000000000000027 ffff8800bac7e000 ffff88021944be48 ffffffff8105a721
Mar 23 16:37:41 zensamba kernel: [85320.064038] Call Trace:
Mar 23 16:37:41 zensamba kernel: [85320.064038]  [<ffffffff8105b47b>] kvm_lock_spinning+0xbb/0x1b0
Mar 23 16:37:41 zensamba kernel: [85320.064038]  [<ffffffff8105a721>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Mar 23 16:37:41 zensamba kernel: [85320.064038]  [<ffffffff817b8886>] ? _raw_spin_lock+0x56/0x60
Mar 23 16:37:41 zensamba kernel: [85320.064038]  [<ffffffff81749200>] ? unix_state_double_lock+0x60/0x70
Mar 23 16:37:41 zensamba kernel: [85320.064038]  [<ffffffff8174b713>] unix_dgram_connect+0x93/0x250
Mar 23 16:37:41 zensamba kernel: [85320.064038]  [<ffffffff8168fde7>] SYSC_connect+0xe7/0x120
Mar 23 16:37:41 zensamba kernel: [85320.064038]  [<ffffffff81690fce>] SyS_connect+0xe/0x10
Mar 23 16:37:41 zensamba kernel: [85320.064038]  [<ffffffff817b8c4d>] system_call_fastpath+0x16/0x1b
Mar 23 16:37:41 zensamba kernel: [85320.064038] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00

00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
Mar 23 16:38:09 zensamba kernel: [85348.064036] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [smbd:1696]
Mar 23 16:38:09 zensamba kernel: [85348.064036] Modules linked in: iptable_nat nf_nat_ipv4 nf_nat iptable_filter xt_mac xt_mark nf_conntrack_ipv4 nf_defrag_ipv4 xt_connmark nf_conntrack iptable_mangle

ip_tables x_tables quota_v2 quota_tree cirrus ttm drm_kms_helper drm joydev syscopyarea sysfillrect sysimgblt i2c_piix4 ppdev mac_hid shpchp serio_raw 8250_fintek parport_pc lp parport hid_generic

usbhid hid floppy psmouse pata_acpi
Mar 23 16:38:09 zensamba kernel: [85348.064036] CPU: 3 PID: 1696 Comm: smbd Tainted: G             L 3.19.0-56-generic #62~14.04.1-Ubuntu
Mar 23 16:38:09 zensamba kernel: [85348.064036] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
Mar 23 16:38:09 zensamba kernel: [85348.064036] task: ffff8802337875c0 ti: ffff880219448000 task.ti: ffff880219448000
Mar 23 16:38:09 zensamba kernel: [85348.064036] RIP: 0010:[<ffffffff8105b976>]  [<ffffffff8105b976>] native_safe_halt+0x6/0x10
Mar 23 16:38:09 zensamba kernel: [85348.064036] RSP: 0018:ffff88021944bd78  EFLAGS: 00000206
Mar 23 16:38:09 zensamba kernel: [85348.064036] RAX: 000000000000003b RBX: 00000000000000b0 RCX: 0000000000000001
Mar 23 16:38:09 zensamba kernel: [85348.064036] RDX: 0000000000000000 RSI: 0000000000008550 RDI: ffff88023fff20c0
Mar 23 16:38:09 zensamba kernel: [85348.064036] RBP: ffff88021944bd78 R08: 000000000000023c R09: ffff88021944bc14
Mar 23 16:38:09 zensamba kernel: [85348.064036] R10: ffff88021944bee2 R11: 0000000000000004 R12: ffffffff811fa3eb
Mar 23 16:38:09 zensamba kernel: [85348.064036] R13: ffff88021944bd18 R14: 0000000000000006 R15: 00000000ffffff9c
Mar 23 16:38:09 zensamba kernel: [85348.064036] FS:  00007fe6cd43f780(0000) GS:ffff88023fd80000(0000) knlGS:0000000000000000
Mar 23 16:38:09 zensamba kernel: [85348.064036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 23 16:38:09 zensamba kernel: [85348.064036] CR2: 000055f937453000 CR3: 000000021944c000 CR4: 00000000000006e0
Mar 23 16:38:09 zensamba kernel: [85348.064036] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 23 16:38:09 zensamba kernel: [85348.064036] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 23 16:38:09 zensamba kernel: [85348.064036] Stack:
Mar 23 16:38:09 zensamba kernel: [85348.064036]  ffff88021944bdc8 ffffffff8105b47b 0000000000000088 0000855035a1fc48
Mar 23 16:38:09 zensamba kernel: [85348.064036]  ffff88021944be48 ffff8800a9c1f800 ffff88021944bec0 ffff88008c0ca580
Mar 23 16:38:09 zensamba kernel: [85348.064036]  0000000000000027 ffff8800bac7e000 ffff88021944be48 ffffffff8105a721
Mar 23 16:38:09 zensamba kernel: [85348.064036] Call Trace:
Mar 23 16:38:09 zensamba kernel: [85348.064036]  [<ffffffff8105b47b>] kvm_lock_spinning+0xbb/0x1b0
Mar 23 16:38:09 zensamba kernel: [85348.064036]  [<ffffffff8105a721>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Mar 23 16:38:09 zensamba kernel: [85348.064036]  [<ffffffff817b8886>] ? _raw_spin_lock+0x56/0x60
Mar 23 16:38:09 zensamba kernel: [85348.064036]  [<ffffffff81749200>] ? unix_state_double_lock+0x60/0x70
Mar 23 16:38:09 zensamba kernel: [85348.064036]  [<ffffffff8174b713>] unix_dgram_connect+0x93/0x250
Mar 23 16:38:09 zensamba kernel: [85348.064036]  [<ffffffff8168fde7>] SYSC_connect+0xe7/0x120
Mar 23 16:38:09 zensamba kernel: [85348.064036]  [<ffffffff81690fce>] SyS_connect+0xe/0x10
Mar 23 16:38:09 zensamba kernel: [85348.064036]  [<ffffffff817b8c4d>] system_call_fastpath+0x16/0x1b
Mar 23 16:38:09 zensamba kernel: [85348.064036] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00

00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
Mar 23 16:38:15 zensamba kernel: [85353.780039] INFO: rcu_sched self-detected stall on CPU { 3}  (t=15001 jiffies g=343192 c=343191 q=0)
Mar 23 16:38:15 zensamba kernel: [85353.780039] Task dump for CPU 3:
Mar 23 16:38:15 zensamba kernel: [85353.780039] smbd            R  running task        0  1696   1681 0x00000008
Mar 23 16:38:15 zensamba kernel: [85353.780039]  ffffffff81c56040 ffff88023fd83d78 ffffffff810a0296 0000000000000003
Mar 23 16:38:15 zensamba kernel: [85353.780039]  ffffffff81c56040 ffff88023fd83d98 ffffffff810a388d 0000000000000087
Mar 23 16:38:15 zensamba kernel: [85353.780039]  0000000000000004 ffff88023fd83dc8 ffffffff810d41a0 ffff88023fd94bc0
Mar 23 16:38:15 zensamba kernel: [85353.780039] Call Trace:
Mar 23 16:38:15 zensamba kernel: [85353.780039]  <IRQ>  [<ffffffff810a0296>] sched_show_task+0xb6/0x130
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff810a388d>] dump_cpu_task+0x3d/0x50
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff810d41a0>] rcu_dump_cpu_stacks+0x90/0xd0
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff810d805c>] rcu_check_callbacks+0x42c/0x670
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff810a48c1>] ? account_process_tick+0x61/0x180
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff810dcf99>] update_process_times+0x39/0x60
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff810ec4a5>] tick_sched_handle.isra.16+0x25/0x60
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff810ec524>] tick_sched_timer+0x44/0x80
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff810ddc57>] __run_hrtimer+0x77/0x1d0
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff810ec4e0>] ? tick_sched_handle.isra.16+0x60/0x60
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff810de037>] hrtimer_interrupt+0xe7/0x220
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff8104abe9>] local_apic_timer_interrupt+0x39/0x60
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff817bbcc5>] smp_apic_timer_interrupt+0x45/0x60
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff817b9cfd>] apic_timer_interrupt+0x6d/0x80
Mar 23 16:38:15 zensamba kernel: [85353.780039]  <EOI>  [<ffffffff8105b976>] ? native_safe_halt+0x6/0x10
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff8101e3a9>] ? sched_clock+0x9/0x10
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff8105b47b>] kvm_lock_spinning+0xbb/0x1b0
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff8105a721>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff817b8886>] ? _raw_spin_lock+0x56/0x60
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff81749200>] ? unix_state_double_lock+0x60/0x70
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff8174b713>] unix_dgram_connect+0x93/0x250
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff8168fde7>] SYSC_connect+0xe7/0x120
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff81690fce>] SyS_connect+0xe/0x10
Mar 23 16:38:15 zensamba kernel: [85353.780039]  [<ffffffff817b8c4d>] system_call_fastpath+0x16/0x1b


Anyone with good news?
Cheers
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: spott on March 24, 2016, 07:22:34 pm
Change kernel to older one
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: kinetica on March 24, 2016, 08:42:20 pm
Hello Spott, thank you for your reply.

Could you please suggest wich kernel version we better use to avoid/fix this bug and, if possible, give some indication/procedure on how to install it?
Since this is a live running server, any help on reducing downtime for the users to minimum, will be very appreciated :)

Cheers!
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: spott on March 24, 2016, 09:09:58 pm
I am running more than 20 days now with GNU/Linux 3.19.0-43-generic x86_64 kernel.
Read this topic here are good step by step guide.
Install manualy the older kernel and then modify Grub to load only this kernel. Or choose in boot manually that kernel. When You are doing that - down down time is only minut or so for restart.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: kinetica on March 26, 2016, 01:51:13 pm
Hello Spott, 
We downgraded to linux-image-3.19.0-43-generic.
Will update this post in case of issues
Waiting for a new patched kernel....  ::)
Thank you
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 26, 2016, 02:34:55 pm
I'm going to downgrade tomorrow evening, i'll keep You updated too
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: peptoniET on March 29, 2016, 09:24:00 am
Quote
Ok. Let's asume 3.19.0-79 solves the problem.

Will "apt-get upgrade" update the kernel after we did a rollback...?  Were kernel updates freezed with the rollback we did...? How to force kernel update back to normal?

Cannot find anything...

Thanks.
not sure where you got  3.19.0-79  kernel i though they where only up to 3.19.0-51
apt-get upgrade will not upgrade kernels.
You need apt-get dist-upgrade  to upgrade kernel.
If you edited /etc/default/grub as in my previous post .that will keep booting by default the kernel you set there.
If you purged the faulty kernel then yes if you dist-upgrade then it will boot with newer kernel that may be installed
If you hold down the shift key when booting you should get the grub boot screen and be able to choose different kernels

I've setup a VM to test around this problem.  Cloned production server.
Apt-get dist-upgrade does nothing and reports no updates are available.  Kernel version stays the same after running the command and re-booting.
I did not edit /etc/default/grub, it's untouched.
Thanks.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 29, 2016, 10:44:35 am
Hi guys,

I've tryied to downgrade to kernel*47 but I wasn't able, server wasn't booting and was stuck. I was able to boot with the kernel*49...strange.
Have to investigate the issue.

Meanwhile I've updated one server to kernel 3.19.0-56-generic...let's see if something change.

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: kinetica on March 29, 2016, 08:35:02 pm
Hi LaM,
our server was up-to-date with kernel: 3.19.0-56-generic, when We start to notice this bug
For us was necessary to downgrade to linux-image-3.19.0-43-generic:

Our procedure was:
-- Check kernel running = uname -r
-- Check the firmware installed with =  dpkg --list | grep linux-image
-- Check file present in the partition =   /boot/
-- Install the old firmware =  apt-get install linux-image-3.19.0-43-generic
-- Check all firmware installed again =  dpkg --list | grep linux-image
-- Modify grab =  /etc/default/grub

# GRUB_DEFAULT=0
GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 3.19.0-43-generic"

-- Reboot
-- Check kernel running = uname -r

Since then Samba and server are running smooth  ;)






Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 30, 2016, 09:40:29 am
Thank You Kinetica,

so You confirm that even kernel 3.19.0-56-generic is affected. Waaay bad, way bad.

I get that this is the developer edition but this is embarrassing...
I'll check with the step You gave me this saturday, hopefully I will be able to fix the only kernel that isn't working (and the only one I need...sigh)

Btw...why specifically linux-image-3.19.0-43-generic and not linux-image-3.19.0-47-generic?

Thanks again!!

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: uro.sh on March 31, 2016, 08:55:08 am
Hi LaM,
our server was up-to-date with kernel: 3.19.0-56-generic, when We start to notice this bug
For us was necessary to downgrade to linux-image-3.19.0-43-generic:

Our procedure was:
-- Check kernel running = uname -r
-- Check the firmware installed with =  dpkg --list | grep linux-image
-- Check file present in the partition =   /boot/
-- Install the old firmware =  apt-get install linux-image-3.19.0-43-generic
-- Check all firmware installed again =  dpkg --list | grep linux-image
-- Modify grab =  /etc/default/grub

# GRUB_DEFAULT=0
GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 3.19.0-43-generic"

-- Reboot
-- Check kernel running = uname -r

Since then Samba and server are running smooth  ;)

Thank you kinetica for your suggestion, but when i tried this my server stuck at boot. On screen was zentyal 4.2 and 5 dots and waiting for network connection to became ready. What did i done wrong? Is there any news for kernel update?
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on March 31, 2016, 03:59:18 pm
Imo it's the kernel...the update must have modified something...

But I don't have any prove of this, it's just an opinion
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: kinetica on April 01, 2016, 05:07:26 pm
Sorry guys for the long delay in the answer.
@LaM = "Btw...why specifically linux-image-3.19.0-43-generic and not linux-image-3.19.0-47-generic?"
 Just because I followed a suggestion from the User Spott stating " I am running more than 20 days now with GNU/Linux 3.19.0-43-generic x86_64 kernel.", and if you read all these five pages if I am not wrong you'll discover that users with linux-image-3.19.0-47-generic had the same issue.
Yes I confirm our server was at kernel 3.19.0-56-generic and the bug was present and painful

@uro.sh
I am not sure what on your boot/server did wrong. Regarding the boot, I followed indications in this guide
https://help.ubuntu.com/community/Grub2/Submenus
After downgrade the kernel I did not upgrade Zentyal at all!

Regarding news from new kernel or solution for this bug unfortunately no update
https://tracker.zentyal.org/issues/4977
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: phaidros on April 02, 2016, 11:51:12 am
This kernel helped me: linux-image-generic-lts-xenial.

Code: [Select]
apt-get install linux-image-generic-lts-xenial
Running 4.4.0.13.7 since ~2 weeks with no crashes.

hth,
.phai
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 02, 2016, 12:20:05 pm
xenial (oldlibs): Generic Linux kernel image (dummy transitional package), 4.4.0.16.17: amd64 i386
Which can be seen here http://packages.ubuntu.com/xenial/linux-image-generic-lts-xenial (http://packages.ubuntu.com/xenial/linux-image-generic-lts-xenial)

@phaidros, with which kernel?

Thx btw

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on April 07, 2016, 09:55:17 pm
Kernel
linux-image-3.19.0-56-generic

S0x, too many kernel panic.

Kernel
linux-image-3.19.0-58-generic

Solved the problems to me!  ;D
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: Andreas Wirth on April 08, 2016, 04:56:32 am
Hi Carlos,

is this for sure and officially confirmed?
Has somebody else positive feedback towards the version: linux-image-3.19.0-58-generic?
I realized the kernel-version was delivered to our productive system at the 06.04.2016. But for me this bug always occurred after a couple of days uptime, and not immediately recognizable.

I mean there are other Debian/Ubuntu forked distributions, who suffered the same issue.
But there at least the official fix was delivered relatively promptly:
E.g. https://forge.univention.org/bugzilla/show_bug.cgi?id=40558

Cheers,
Andreas
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: jwilliams1976 on April 09, 2016, 01:47:22 am
FYI
This is a kernel bug https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785 (https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785). You can test whether it's fixed by running the command: 'ip rule show' It should just spit out the rules and exit but on any versions with the bug it just loops and never exits. Zentyal must use this command somewhere and after a while it eats up all CPU and memory resources and results in the CPU soft hang.

Quick way to test it instead of waiting a week for Zentyal to crap out.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: hotsummer55 on April 09, 2016, 11:24:04 am
Quote
FYI
This is a kernel bug https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785. You can test whether it's fixed by running the command: 'ip rule show' It should just spit out the rules and exit but on any versions with the bug it just loops and never exits. Zentyal must use this command somewhere and after a while it eats up all CPU and memory resources and results in the CPU soft hang.

Quick way to test it instead of waiting a week for Zentyal to crap out.

Not sure about this .I tested this against know bad kernel linux-image-3.19.0-49-generic.And it did not produce any problems when running ip rule show.
What kernel are you running now
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on April 09, 2016, 04:47:11 pm
Quote
FYI
This is a kernel bug https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785. You can test whether it's fixed by running the command: 'ip rule show' It should just spit out the rules and exit but on any versions with the bug it just loops and never exits. Zentyal must use this command somewhere and after a while it eats up all CPU and memory resources and results in the CPU soft hang.

Quick way to test it instead of waiting a week for Zentyal to crap out.

Not sure about this .I tested this against know bad kernel linux-image-3.19.0-49-generic.And it did not produce any problems when running ip rule show.
What kernel are you running now

use the command on the kernel  linux-image-3.19.0-56-generic and nothing happened, and that is an affected version according to the forums...  ???
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: Andreas Wirth on April 11, 2016, 04:12:38 am
@jwilliams1976:
Na' sorry, I don't think, that your mentioned bug https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785 has got anything to do with it.
It might be a bug to keep an eye on, hopefully we don't get affected as well. (Don't need another one!)

I do believe this bug is related to samba (smbd) in combination with the kernel. (I bet if you turn off smbd, the bug disappears)
But it is occurring in and affecting obviously several kernel versions:
E.g. for the kernel 3.13.0-77:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980

But also for the kernel in UCS, what I mentioned before, for the kernel 4.1.16 in this bug:
https://forge.univention.org/bugzilla/show_bug.cgi?id=40558

So I still better stick to 3.19.0.47 in Zentyal for the moment, which seems to do the job for now... until somebody confirms that 3.19.0-58 is working properly for him/her.
Or the proper quick test to confirm, that the bug is gone. Like I mentioned before, for me it always took a couple of days, 6 usually in average, until the system crashed.
And running with 3.19.0.47, I realise, that the system frees memory from time to time (e.g. over night), instead of putting continuously on top, until this CPU lockup occurs and the killing of processes starts.
(Sorry our system is productive, and I can't mess around with it... anymore)

But please keep your experiences up2date here in this thread, if you've got a test system running, that reproduces this bug.
Have much thanks to everybody in advance...

@Carlos: Is your system still running alright with 3.19.0-58? Please keep us up2date...

[update]
Obviously Fedora 23 with kernel 4.4.3 runs into the same bug, reported by this user running on a cubietruck system:
http://www.cubieforums.com/index.php?topic=4076.0
But he or she restricts its occurrence to a high network IO in general via 'smb, scp, or rsync over ssh', but on the opposite the CPU lockup is always logged towards a smbd process.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 11, 2016, 08:57:17 am
Hey guys,

have anyone found a way to force the issue?

I'm running fine on the only updated machine which runs the .56 kernel....quite strange (now that I've said that hell will run on that machine  ::) )
uname -a
Linux dccharlie 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
uptime
 08:49:24 up 14 days, 10:45,  1 user,  load average: 0.25, 0.17, 0.15

Thx

L


Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on April 11, 2016, 05:41:30 pm
@jwilliams1976:
Na' sorry, I don't think, that your mentioned bug https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785 has got anything to do with it.
  • There's nowhere mentioned, that a CPU soft lockup is occurring
  • There's only mentioned, that it messes up the rules table, which of course might be fatal and messing up the system's operational status as well
It might be a bug to keep an eye on, hopefully we don't get affected as well. (Don't need another one!)

I do believe this bug is related to samba (smbd) in combination with the kernel. (I bet if you turn off smbd, the bug disappears)
But it is occurring in and affecting obviously several kernel versions:
E.g. for the kernel 3.13.0-77:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980

But also for the kernel in UCS, what I mentioned before, for the kernel 4.1.16 in this bug:
https://forge.univention.org/bugzilla/show_bug.cgi?id=40558

So I still better stick to 3.19.0.47 in Zentyal for the moment, which seems to do the job for now... until somebody confirms that 3.19.0-58 is working properly for him/her.
Or the proper quick test to confirm, that the bug is gone. Like I mentioned before, for me it always took a couple of days, 6 usually in average, until the system crashed.
And running with 3.19.0.47, I realise, that the system frees memory from time to time (e.g. over night), instead of putting continuously on top, until this CPU lockup occurs and the killing of processes starts.
(Sorry our system is productive, and I can't mess around with it... anymore)

But please keep your experiences up2date here in this thread, if you've got a test system running, that reproduces this bug.
Have much thanks to everybody in advance...

@Carlos: Is your system still running alright with 3.19.0-58? Please keep us up2date...

[update]
Obviously Fedora 23 with kernel 4.4.3 runs into the same bug, reported by this user running on a cubietruck system:
http://www.cubieforums.com/index.php?topic=4076.0
But he or she restricts its occurrence to a high network IO in general via 'smb, scp, or rsync over ssh', but on the opposite the CPU lockup is always logged towards a smbd process.

Code: [Select]
root@servet:~# dmidecode | grep "^System Information" -A8
System Information
        Manufacturer: HP
        Product Name: ProLiant ML150 G6
        Version: 1.0
        Serial Number: MXS108003W
        UUID: 745FC10B-XXXX-DF11-XXXX-C192EAA48B93
        Wake-up Type: Power Switch
        SKU Number: 466132-001
        Family: ProLiant Server

root@servet:~# uname -a
Linux servet 3.19.0-58-generic #64~14.04.1-Ubuntu SMP Fri Mar 18 19:05:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

root@servet:~# uptime
 12:35:34 up 3 days, 20:19,  1 user,  load average: 0,03, 0,10, 0,08

Code: [Select]
root@servpcr-fw:~# dmidecode | grep "^System Information" -A8
System Information
        Manufacturer: HP
        Product Name: ProLiant ML110 G5
        Version:      NA
        Serial Number: MX2014011G
        UUID: 44F48208-XXXX-5606-XXXX-560649F92209
        Wake-up Type: Power Switch
        SKU Number: AT040A
        Family: 1234567890

root@servpcr-fw:~# uname -a
Linux servpcr-fw 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

root@servpcr-fw:~# uptime
 12:40:31 up 5 days, 12:40,  1 user,  load average: 0,16, 0,36, 0,31
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: spott on April 11, 2016, 06:26:59 pm
pcready.cl - does you have virtualized servers? Mainly have her problems when Zentyal is running in VPS - at least my server is virtualized.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on April 11, 2016, 07:59:45 pm
pcready.cl - does you have virtualized servers? Mainly have her problems when Zentyal is running in VPS - at least my server is virtualized.

They are all dedicated servers.

But I have them running Windows virtual machines with VirtualBox 5.0.16.

The failure is random, at least .58 are the server kernel and has had no problems.

Instead the other server with .56 kernel has never presented me problems.

The truth is not like nor reproduce the problem that is caused, both servers are in production.

.56 Kernel which has no active users samba, only used as Firewall, perhaps why it has not failed.

The server currently has the .58 kernel before the kernel had .56 and had active users in samba, about 15 concurrent users. And he had problems once a week, once twice a day.

Since the upgrade to version .58 I have not had more problems, so I think the .56 kernel without users samba does not fail, but when you already have access samba presents problems.

If the kernel fails .58 it will report immediately, greetings!
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 11, 2016, 08:09:35 pm
Nice...so 58 looks stable...

But waiting for the issue to come...isn't there a way to force the issue?

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on April 11, 2016, 08:33:26 pm
Nice...so 58 looks stable...

But waiting for the issue to come...isn't there a way to force the issue?

L

Wait to see if one of my servers fails in version .58 or .56 and you commented how it goes fails, I think it is best to wait at least a week.

But the truth is not as forcing or reproduce the error in order to deliver a more concrete report.

I will report this way. Regards!
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 11, 2016, 09:16:34 pm
That's my point. I would like to find a way to reproduce the issue in order to be sure that is gone from the installed kernel.
Waiting is not the correct option imo. It doesn't give You the assurance that the kernel is bug-free
E.g. mine run with .51 and .56 and had been well for days...more than a week (and then one started to crush...)

Honestly I'm still trying to figure how to reproduce it. Looks latched to some concurrency with samba's calls...but i'm not sure.

I'll update You all asa i've more infos...

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on April 11, 2016, 09:26:17 pm
That's my point. I would like to find a way to reproduce the issue in order to be sure that is gone from the installed kernel.
Waiting is not the correct option imo. It doesn't give You the assurance that the kernel is bug-free
E.g. mine run with .51 and .56 and had been well for days...more than a week (and then one started to crush...)

Honestly I'm still trying to figure how to reproduce it. Looks latched to some concurrency with samba's calls...but i'm not sure.

I'll update You all asa i've more infos...

L

My servers are production can not come and change the kernel, leave you with these versions and hope for the best lol.

I will report any errors here in the forum.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: jwilliams1976 on April 11, 2016, 10:35:07 pm
The 'ip route ls' command I mentioned earlier has worked for me to test that the bug exists or does not in a given kernel. See this post for more info:
https://forum.zentyal.org/index.php/topic,26954.msg99367.html#msg99367 (https://forum.zentyal.org/index.php/topic,26954.msg99367.html#msg99367)

Quote
It stems from a bug in the kernel that makes the ip command output the first rule infinitely.  You can use this command to see if you're affected:
ip route ls

Broken Output:
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
<repeats indefinitely - ctrl+c to quit>

In Zentyal, this causes one of the network scripts to hang because it's waiting for that command to end.  This prevents loading of other services and resulted in my network being severely broken.

Besides the previously mentioned fix of rolling back the kernel, you can modify the script in question:
/usr/share/zentyal-network/flush-fwmarks
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 12, 2016, 01:34:56 am
That's my point. I would like to find a way to reproduce the issue in order to be sure that is gone from the installed kernel.
Waiting is not the correct option imo. It doesn't give You the assurance that the kernel is bug-free
E.g. mine run with .51 and .56 and had been well for days...more than a week (and then one started to crush...)

Honestly I'm still trying to figure how to reproduce it. Looks latched to some concurrency with samba's calls...but i'm not sure.

I'll update You all asa i've more infos...

L

My servers are production can not come and change the kernel, leave you with these versions and hope for the best lol.

I will report any errors here in the forum.

Mine are production servers either of course, tha'ts why I don't want to wait the bug to happen during day (production time) but I would like to stress the system during night (not-so-much production time) in order to try finding the solution. =)

The 'ip route ls' command I mentioned earlier has worked for me to test that the bug exists or does not in a given kernel. See this post for more info:
https://forum.zentyal.org/index.php/topic,26954.msg99367.html#msg99367 (https://forum.zentyal.org/index.php/topic,26954.msg99367.html#msg99367)

Quote
It stems from a bug in the kernel that makes the ip command output the first rule infinitely.  You can use this command to see if you're affected:
ip route ls

Broken Output:
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
<repeats indefinitely - ctrl+c to quit>

In Zentyal, this causes one of the network scripts to hang because it's waiting for that command to end.  This prevents loading of other services and resulted in my network being severely broken.

Besides the previously mentioned fix of rolling back the kernel, you can modify the script in question:
/usr/share/zentyal-network/flush-fwmarks

I'll test tomorrow night and check the script, THX FOR THE HINT!! =)

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: Andreas Wirth on April 12, 2016, 05:23:42 am
@Carlos:
Thanks for letting run 3.19.0-58, keeping us up2date and giving feedback... I do very much appreciate that.  ;)

You wrote:
Quote
.56 Kernel which has no active users samba, only used as Firewall, perhaps why it has not failed.
I suppose, yes you are right with that suggestion: no active users = no crash
Because, if I remember correctly, my last system crash was with kernel 3.19.0-56, after I had enough of the experiments and downgraded to 3.19.0-47 as suggested here.
I am running it virtualized on KVM as well, but towards all collected examples in comparison, this bug obviously seems to affect both, running it virtualized and on dedicated systems.
Obviously no difference.

@jwilliams1976:
May I ask you, which Zentyal version you are running and to post your kernel version here (, which is affected by the 'ip' bug)?
(Just to make sure, the kernel version is not mistaken...)

For example:
hotsummer55 tested your command example with: linux-image-3.19.0-49-generic
Carlos (alias pcready.cl) tested your command example with: linux-image-3.19.0-56-generic
And their results are obviously fine, so no:
Quote
Broken Output:
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
0:   from all lookup local
<repeats indefinitely - ctrl+c to quit>
But both kernel versions, 3.19.0-49 and 3.19.0-56, according to this forum thread are confirmed to be affected by this CPU soft lockup bug in combination and syslog-ed towards the smbd process.

E.g.:
I installed Zentyal 4.2 directly in December 2015 starting from kernel version 3.19.0.42, if I remember correctly, so never upgraded from Zentyal 4.1 and its 3.16.0-xx kernel version.
And in the Zentyal forum thread you posted yesterday ( https://forum.zentyal.org/index.php/topic,26954.15.html alias high CPU usage + huge issues with network services), in there the users reported, that an upgrade to Zentyal 4.2 and working with kernel version > 3.19.0-37-generic fixed their 'ip' problem.
E.g. 3.19.0-39 did it
Or as solution they purged (not only removed) and downgraded in Zentyal 4.1 to kernel version < 3.16.0-52.
E.g. 3.16.0-51 is in this forum thread to be confirmed to work properly without the 'ip' issue.

And that covers again perfectly and is identical with your previously posted Ubuntu launchpad bug report @ https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785
But therefore mostly valid with Zentyal version 4.1 and in there again reported by using the following kernels:
Quote
PBR NOT working on:
3.13.0-69
3.13.0-70
3.16.0-52 <used by Zentyal 4.1>
3.16.0-53 <used by Zentyal 4.1>
3.19.0-37 <probably initially used by Zentyal 4.2>
Yes, there was obviously a problem with ip in the previous Zentyal/Ubuntu release.
Symptoms are often similar, but could it be, that you mix up here the kernel 3.16.0-xx and 3.19.0-xx?
(Please no offence. Everything helps, that might help to identify affected kernel versions for our problem) 

@LaM:
Yes I'd love to have a test as well to quickly confirm, if a kernel is affected by this bug, and not to wait until the system crashes.
Especially for the future it might be relevant as well, because the other examples e.g. in UCS (seems2Bfixed) and with Fedora seem to confirm, that also the newer kernel 4.x versions, e.g. 4.1.16-xx and 4.4.3-xx are affected by this bug.

Cheers,
Andreas
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: BerT666 on April 12, 2016, 04:22:54 pm
Howdy,

maybe a huge data transfer (~500GB to 1TB) will force this to occur.

In my setting I only get problems when it comes to transfers like this...

Regards

Thomas
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 12, 2016, 04:49:33 pm
In our case we only have transfers which are less than 1 GB...   =(

I rather think it occurs when there are many concurrency calls...but still i'm not able to produce a test.

@BerT666, did You reproduce the issue?

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: jwilliams1976 on April 12, 2016, 05:54:31 pm
@Andreas
I am currently using 3.19.0-49 on my server without issue for 9 days. I ran into the bug when I upgraded the kernel to the 3.19.0-53 version. I know at the time I was having issues I had found the 'ip route ls' command and tested on these two versions and the -53 version kept running the line over and over until CTRL-C while the -49 exited on it's own. That is pretty much all the testing I did. It's not clear if Carlos or anyone else has ever tried to run the 'ip route ls' command on -56 or -58 kernels. It may not be an indicator at all or is unrelated. Hopefully someone finds a reliable way to test it. I'm going to work on a test bed today as I can't mess with my production server during the day.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: Andreas Wirth on April 13, 2016, 08:02:57 am
@LaM and @BerT666:
I don't think, that that can or should be the test to find out, if a kernel-version is affected by this bug or not.  :-\
- data-transfer of 500 GB up to 1 TB...
- or instruct all users to put as much operation on it as possible at the same time...
That's IT technically destructive for the own reputation, "Oh yeah, please help me to crash the server"

@Carlos:
You reached my magic uptime mark today with with more than 6 days uptime.
I'll probably give 3.19.0-58 a try, when you've got an uptime longer than 14 days without any issues...  ;)
Sorry, that you are the guinea pig (laboratory-technically seen)

@jwilliams1976:
hotsummer55 wrote as answer towards your post @April 09, 2016, 11:24:04 am, that he/she tested 'ip route ls' with:
Quote
Not sure about this .I tested this against know bad kernel linux-image-3.19.0-49-generic.And it did not produce any problems when running ip rule show.
What kernel are you running now

and Carlos (alias pcready.cl) tested and wrote as answer towards your post @ April 09, 2016, 04:47:11 pm
Quote
use the command on the kernel  linux-image-3.19.0-56-generic and nothing happened, and that is an affected version according to the forums...  ???

And if my interpretation is right, then both kernels are affected by the bug of this forum thread 'CPU soft lockup', but regarding towards their reply not by the 'ip' bug you posted. Sorry I can't test it, but the Ubuntu launchpad bug track + the Zentyal forum thread towards the 'ip' bug, you posted, are covering perfectly and are identical towards the won cognition.

So 'ip' bug occurs in these relevant kernels:

3.16.0-52, 3.16.0-53 (Zentyal 4.1)
and
3.19.0-37 (probably Zentyal 4.2)

and have been fixed in:
>= 3.16.0-55.74 (Zentyal 4.1)
and
>= 3.19.0-39.44 (Zentyal 4.2)

Don't get me wrong a re-occurrence of this 'ip' bug is definitely possible, and then only you have tested 3.19.0-53 towards that 'ip' bug.
But when I simply type this into my commandline:
Code: [Select]
██ root@dcrc-dcx1:/var/log
██ 13:31:03 ᛤ  dpkg --list | grep linux-image
rc  linux-image-3.19.0-25-generic         3.19.0-25.26~14.04.1                amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
rc  linux-image-3.19.0-39-generic         3.19.0-39.44~14.04.1                amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
rc  linux-image-3.19.0-41-generic         3.19.0-41.46~14.04.2                amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
rc  linux-image-3.19.0-42-generic         3.19.0-42.48~14.04.1                amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-3.19.0-43-generic         3.19.0-43.49~14.04.1                amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-3.19.0-47-generic         3.19.0-47.53~14.04.1                amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-3.19.0-49-generic         3.19.0-49.55~14.04.1                amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-3.19.0-51-generic         3.19.0-51.58~14.04.1                amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-3.19.0-56-generic         3.19.0-56.62~14.04.1                amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-3.19.0-58-generic         3.19.0-58.64~14.04.1                amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
rc  linux-image-extra-3.19.0-25-generic   3.19.0-25.26~14.04.1                amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
rc  linux-image-extra-3.19.0-39-generic   3.19.0-39.44~14.04.1                amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
rc  linux-image-extra-3.19.0-41-generic   3.19.0-41.46~14.04.2                amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
rc  linux-image-extra-3.19.0-42-generic   3.19.0-42.48~14.04.1                amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
rc  linux-image-extra-3.19.0-43-generic   3.19.0-43.49~14.04.1                amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-extra-3.19.0-47-generic   3.19.0-47.53~14.04.1                amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-extra-3.19.0-49-generic   3.19.0-49.55~14.04.1                amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-extra-3.19.0-51-generic   3.19.0-51.58~14.04.1                amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-extra-3.19.0-56-generic   3.19.0-56.62~14.04.1                amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-extra-3.19.0-58-generic   3.19.0-58.64~14.04.1                amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-generic-lts-vivid         3.19.0.58.41                        amd64        Generic Linux kernel image

██ root@dcrc-dcx1:/var/log
██ 13:31:11 ᛤ 

Then I don't even get offered a linux-image-3.19.0-53-generic. There's only a linux-image-3.19.0-51-generic and a linux-image-3.19.0-56-generic. (I have both installed in my system as you see). Probably you mean 3.19.0-51. So I just want to make sure, that we are all on the same page, and not mistaken by a kernel version mix up.

So now -56 and -49 have been tested towards the 'ip' bug by hotsummer55 and Carlos. And they didn't find any issues on their system.

But here one time all affected versions of the 'CPU soft lockup' bug, we are dealing with in this forum thread, as far as I know and carried together quickly:

linux-image-3.19.0-49 (e.g. https://tracker.zentyal.org/issues/4977 , in this forum thread and by my own experience )
linux-image-3.19.0-51 (in this forum thread and by my own experience)
linux-image-3.19.0-56 (in this forum thread and by my own experience)
and Carlos is currently 'long-term' ;) testing linux-image-3.19.0-58, and finding indirectly out for us, if it is safe to switch back to the main kernel upstream.

I don't know the background of your Zentyal system, if you upgraded from Zentyal 4.1 with kernel 3.16.0-53 or ran it with 3.19.0-37, which is by the way also not available among my installable kernels, but then definitely yes, it was affected by the 'ip' bug.
So to quote you:
Quote
[..] but on any versions with the bug it just loops and never exits. Zentyal must use this command somewhere and after a while it eats up all CPU and memory resources and results in the CPU soft hang.

Quick way to test it instead of waiting a week for Zentyal to crap out.
and:
Quote
The 'ip route ls' command I mentioned earlier has worked for me to test that the bug exists or does not in a given kernel.

and to quote me:
Quote
Na' sorry, I don't think, that your mentioned bug https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785 has got anything to do with it.

So for me everything seems to speak against the theory, that the 'CPU soft lockup' bug, we're having issues with, in common has anything to do with the 'ip' bug, you experienced with your system(s). I'd just like to avoid following down a wrong path. That's all.

By the way you are running currently definitely a bug affected kernel version with 3.19.0-49. But 9 days is great, I never made it that long.

But I would love to have verified:

So the only most plausible test procedure seems to be at the moment to bring as much concurrent user activity on the system as possible, which is very semi-optimal.
And I am very sorry for the fact, that I can't test by myself.  :-[
But if I risk any further inconvenience on that system probably my head gets chopped of.

I mean I have 2x Zarafa (not to be mistaken with Zentyal) on CentOS servers running virtualized on KVM since middle of 2011, ticking like a Switzer clockwork, and never experienced any issues, no update issues, works like a charm. Only through the cat and mouse game caused by MS Office updates, which lead to problems with MS Outlook's connector compatibility, but was always fixed immediately by Zarafa with a client-connector update. But that's also only client-sided and has nothing to do with the server itself. 
Since Zarafa announced to not to support officially MS Outlook as Email/Groupware client anymore, Zentyal seems to be the only alternative, to set up something beside the cost intensive MS licenses trap.
I tested SoGO 3 ( http://sogo.nu/ ) as well in the beginning of the year... email fine, groupware functionality stuffed up with MS Outlook, so that I won't really recommend to use it productive.

Since 2 months it is a nightmare with the Zentyal system, every morning system checks and to evaluate, if the traffic still flows.
Zentyal is on a good way, that's for sure... But this bug costs really nerves.

It would be great, if Zentyal would give away a free commercial test lab license to play around with for such purposes, while being subscripted and maintaining a productive system. That'll be great.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: Andreas Wirth on April 14, 2016, 08:04:20 am
Hi to Everybody,

so I did a bit of more research towards that bug and want to keep the forum up2date.

The bug has been initially reported in the official Ubuntu 14.04 LTS kernel in version 3.13.0-77.
The bug was then officially fixed @ the 05 April 2016 in the build of this kernel - 3.13.0-85.129.
( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980 )

other affected versions of Ubuntu's 3.x kernel line so far reported containing this bug:
Quote
v3.14: 9d054f57adc981a5f503d5eb9b259aa450b90dc5
v3.12: 9964b4c4ee925b2910723e509abd7241cff1ef84
v3.10: da8db0830a2ce63f628150307a01a315f5081202
ckt/linux-3.13.y: 6505b15f7f7efde1853b5a7641e9ce675c2b1a96
v3.4: -
v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
I don't know, if fixed, but irrelevant for us.

The 4.x version of the kernel was reported to contain the bug as well, mainly by the maintainers of the Univention Corporate Server:
There the upstream kernel 4.1.16 was affected by the bug.
The bug was then temporary fixed by reverting 2 previous kernel commits already @ the 12 Feb. 2016 in this build of their kernel 4.1.6-1.174.201602110938 to keep their customers up and running. (Univention has their own Linux kernel developers internally and proven here: http://errata.software-univention.de/ucs/4.1/114.html )

If you follow up on the developers' communications @ https://www.mail-archive.com/kernel-packages@lists.launchpad.net/msg159625.html
This temporary patch for the bug was provided by Philipp Hahn (working for Univention):
Quote
Reverting the patch "unix: avoid use-after-free in ep_remove_wait_queue"
in 4.1 fixes my problem (for now). The original patch went into 4.4, but
was back-ported to several stable trees:


v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
v3.18: 72032798034d921ed565e3bf8dfdc3098f6473e2
v4.1: 5c77e26862ce604edea05b3442ed765e9756fe0f
v4.2: bad967fdd8ecbdd171f5f243657be033d2d081a7
v4.3: 58a6a46a036ce81a2a8ecaa6fc1537c894349e3f
v4.4: 7d267278a9ece963d77eefec61630223fce08c6c

After integrating this temporary patch by Philipp Hahn to keep their customers' systems fully operational, there then is mentioned the true official bug fix at the 23 Feb 2016 for the first time:
Quote
Rainer Weikusat sent a patch named
 [PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg
 < https://patchwork.ozlabs.org/patch/582017/ >
which fixes the problem.


For our distribution we released chose to revert the original patch as
we needed a working kernel as fast as possible, as several of our
customers were hit by that bug.

I tested the patch from Rainer and it also made the bug disappear.
David Miller also picked the patch for stable and we will do the same
when next be build a new kernel for our release.

Philipp


Rainer Weikusat's official fix for the bug and the official commit was done by David S. Miller into the newest 4.5 main Linux kernel development version maintained by Linus Torvalds @ the 16 Feb. 2016:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a5527dda344fff0514b7989ef7a755729769daa1

For the most recent 4.4.x stable kernel version it was announced to be in review:
Quote
For 4.4 it is in review right now for 4.4.4 as announced by greg k-h
yesterday: < https://lkml.org/lkml/2016/3/1/828 >
I didn't find any information, if this fix has found its way into the 4.4.4 kernel version, but if it is contained and Fedora in current version 23 considers to update from kernel 4.4.3 to 4.4.4, that will make this Cubietruck user very happy again: http://www.cubieforums.com/index.php?topic=4076.0

As I initially mentioned in this post, Ubuntu fixed the bug officially with LTS kernel build 3.13.0-85.129 @ the 05 April 2016, but reported to them was already @ the 10 Feb. 2016 by Karolin Seeger (probably also working for Univention) and in the kernel communication:
Quote
Thanks Philipp!

I just hope to trigger some reaction from the ubuntu maintainers
in order get a usable kernel more than two week after breaking it.
So it basically took the Ubuntu maintainers quite a while (almost 2 months) to officially fix it in their 14.04 LTS kernel in version 3.13.0-xx.

So now the big question: Where are we with Zentyal at the moment?

Zentyal 4.2 is not using the Ubuntu 14.04 LTS kernel 3.13.0-xx alias Trusty.
For some reason they chose to use the 3.19.0-xx kernel from the Vivid release alias Ubuntu 15.04.

And I couldn't find any hints or clues, that the 3.19.0-xx Vivid kernel tree has been reported towards that bug as well.
But we all experience the effect of this bug, our system(s) is/are crashing!!!
And no recognizable action has been taken yet by the reported bug: https://tracker.zentyal.org/issues/4977 (not been assigned, confirmed, marked as duplicate, refused or anything else)

So I would now realistically suggest, that the last week received kernel-version 3.19.0-58 won't contain any bug fix. (Please prove me wrong!!!)
What I love about OpenSource is, that it is truly open and transparent.

So basically, if someone from the Zentyal support or development team has a look into this forum thread: Please help us, bring in a statement towards the matter or a confirmation to take it on.
We can't patch the kernel by ourself, that's a bit too deep.

The bug can be reproduced and confirmed, but obviously only as developer on an affected system (To be honest I don't know how to do it):
Quote
It's easily reproducible by running the following commands in the Samba master branch:
./configure.developer TDB_NO_FSYNC=1 make -j test FAIL_IMMEDIATELY=1 SOCKET_WRAPPER_KEEP_PCAP=1 TESTS="samba3.raw.composite"

If I am right (and I really want to be wrong towards that matter):
The only way for us in Zentyal 4.2 right now, would be to change and install a kernel from another version tree, which contains the fix, but I don't want to risk any incompatibilities towards other installed components of Zentyal probably or possibly coming along with it.

Cheers,
Andreas
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on April 14, 2016, 05:19:59 pm
SERVER 15 USERS SAMBA - ZENTYAL 4.2
Code: [Select]
root@servet:~# dmidecode | grep "^System Information" -A8
System Information
        Manufacturer: HP
        Product Name: ProLiant ML150 G6
        Version: 1.0
        Serial Number: MXS108003W
        UUID: 745FC10B-XXXX-DF11-XXXX-C192EAA48B93
        Wake-up Type: Power Switch
        SKU Number: 466132-001
        Family: ProLiant Server

root@servet:~# uname -a
Linux servet 3.19.0-58-generic #64~14.04.1-Ubuntu SMP Fri Mar 18 19:05:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

root@servet:~# uptime
 12:18:20 up 6 days, 20:01,  1 user,  load average: 0,18, 0,25, 0,21

FIREWALL NON SAMBA USERS - ZENTYAL 4.2
Code: [Select]
root@servpcr-fw:~# dmidecode | grep "^System Information" -A8
System Information
        Manufacturer: HP
        Product Name: ProLiant ML110 G5
        Version:      NA
        Serial Number: MX2014011G
        UUID: 44F48208-XXXX-5606-XXXX-560649F92209
        Wake-up Type: Power Switch
        SKU Number: AT040A
        Family: 1234567890

root@servpcr-fw:~# uname -a
Linux servpcr-fw 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

root@servpcr-fw:~# uptime
 12:19:19 up 8 days, 12:18,  1 user,  load average: 0,29, 0,36, 1,38
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 18, 2016, 07:19:36 am
Hi everybody!

Sorry I was traveling for job and have been away for almost a week.

Before quoting and reporting "news" let me show some new stats (I will call the 3 servers A, B and C):

Server
|
Kernel
|
Uptime
|
load average
|
Samba load
A|3.19.0-51-generic|06:14:14 up 6 days, 19:52|0.00, 0.01, 0.05|High
B|3.19.0-51-generic|06:14:16 up 5 days, 17:28|0.04, 0.07, 0.12|mid/low
C|3.19.0-56-generic|06:14:19 up 21 days,  8:10|0.09, 0.11, 0.10|mid/high

The 3 servers have the same exact HW and diff only for the installed kernel.

Ok. Now:

@Andreas
@LaM and @BerT666:
I don't think, that that can or should be the test to find out, if a kernel-version is affected by this bug or not.  :-\
- data-transfer of 500 GB up to 1 TB...
- or instruct all users to put as much operation on it as possible at the same time...
That's IT technically destructive for the own reputation, "Oh yeah, please help me to crash the server"

I wouldn't ask anybody to help out to stress test my systems; of course it isn't quite a clever idea even to think about something like that and never hit my mind (or been close thinking to write here something like that).
Being a system administrator and having full control over server and clients in network I can use some scripts/jobs to stress test the sys during night (aka outside production time).
Talking about reputation I think that a hit to our reputation will be sitting here waiting for the system to collapse (Which luckily isn't what we're doing).

There's a slight difference in waiting for the issue to come out and react before it comes out and as You later wrote in Your post:

I would love to have verified:
  • that the bug is gone
  • and finding a quick test to verify, if a system is affected by this bug or not

and of course I'm with You saying

Where are we with Zentyal at the moment?
but even better
Where are the Zentyal maintainers? Do they read this forum?!

Anyway

(Disclaimer: this is just a hint) I've found out that samba shares' RecycleBin (but some 'regular' folders are either affected) are being filled with .tmp files.
I think that this started to occur after the first buggy kernel update. My bad that I've ignored the situation before, thinking it was related to something else (and latched only to one directory) but after a quick search I've discovered that my share's RecycleBin folders are quite full and I've also been reported that now this files are shown in regular folders either and sometimes being manually deleted (sigh). Could it be related?
Looks like that creating a tmp file is a normal procedure for samba (in order to preserve files during write ops) but tmp files not being always correctly deleted (they shouldn't be in RecycleBin) after normal operations is quite strange.

@Andreas
I'll dig the temporary patch by Philipp and in this direction:
The bug can be reproduced and confirmed, but obviously only as developer on an affected system (To be honest I don't know how to do it):
Quote
It's easily reproducible by running the following commands in the Samba master branch:
./configure.developer TDB_NO_FSYNC=1 make -j test FAIL_IMMEDIATELY=1 SOCKET_WRAPPER_KEEP_PCAP=1 TESTS="samba3.raw.composite"

Good call!

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: Andreas Wirth on April 18, 2016, 10:39:38 am
Hi LaM,

have much thanks for your detailed post.
But you don't need to dig anymore in Philipp's direction, I think. The "lockup - CPU" bug in combination with samba is already officially fixed by Rainer Weikusat in the most recent official Linux kernel version (v. 4.5) and will be back ported in the kernel source to other kernel versions.

Just to bring in order what really has happened and a bit more summarized, as far as I understood the whole communication in the mail archive correctly:
This bug was original brought to us by 2 patches committed into the source code of Linux Kernel version 4.4.
These 2 patches, responsible for our trouble, have been then back-ported into these known kernel versions:
1.) mentioned in the communication of the Univention maintainers (-> their kernel is obviously relying on the Ubuntu kernel as well):
Quote
v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
v3.18: 72032798034d921ed565e3bf8dfdc3098f6473e2
v4.1: 5c77e26862ce604edea05b3442ed765e9756fe0f
v4.2: bad967fdd8ecbdd171f5f243657be033d2d081a7
v4.3: 58a6a46a036ce81a2a8ecaa6fc1537c894349e3f
2.) but then also identified by and reported to the Ubuntu maintainers in Feb 2016:
Quote
v3.14: 9d054f57adc981a5f503d5eb9b259aa450b90dc5
v3.12: 9964b4c4ee925b2910723e509abd7241cff1ef84
v3.10: da8db0830a2ce63f628150307a01a315f5081202
ckt/linux-3.13.y: 6505b15f7f7efde1853b5a7641e9ce675c2b1a96
v3.4: -
v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
v3.19 is not mentioned or contained!!!

But in Zentyal 4.2 we also experienced this bug since beginning of Feb. in the used Ubuntu Vivid kernel version v3.19:

Philipp Hahn (working for Univention) reverted then these 2 official kernel patches (back-ported from kernel 4.4 obviously through the Ubuntu maintainers), which affected their 4.1.16 kernel. In Univention they call this bug in a lovely way the 'samba deadlock'. So basically he reverted simply these 2 patches coming from the official (Ubuntu) Linux kernel, which fixed the 'samba deadlock' bug in their Univention kernel, but which has to be only seen as temporary. They built this kernel 4.1.6-1.174.201602110938 out of the source code with these 2 reverted patches by Philipp on the 12. Feb. and delivered it immediately via updates to all their customers using the Univention Corporate Server, that's how their Linux distribution is called.

Then they went into communication with the Linux kernel developers (around Linus Torvalds) and to the Ubuntu maintainers and reported the kernel bug to them.
Then Philipp found out on the 23. Feb., that the bug has been already fixed by an official patch provided by Rainer Weikusat. Philipp tested his patch in their kernel and it fixed the 'samba deadlock' as well. This official patch was also already brought into the v4.5 Linux kernel version (committed in the GIT repo into Torvald's kernel branch at the 16. Feb.) and from then obviously back-ported to other kernel versions, e.g. like v4.4. 
Now they were obviously just waiting, when the Ubuntu maintainers bring their fixed kernel built, which happened @ 05. April, e.g. as happened for the Ubuntu 14.04 LTS (Trusty Thar) with kernel in version 3.13.0-85.

So now I mentioned, that Zentyal 4.2 is not using this kernel v3.13 from Ubuntu LTS 14.04 (Trusty Thar). It is using v3.19 from the Ubuntu release 15.04 (Vivid Vervet).
So and as you can see here https://wiki.ubuntu.com/Releases Ubuntu 15.04 with its kernel v3.19 is at its end of life since the 4. Feb. 2016.
So basically it could mean the Ubuntu maintainers don't even need to fix kernel v3.19 towards the 'samba deadlock' bug anymore.

I have hope, because we received @ 6 April a kernel update in version 3.19.0-58, but it is not clear yet, that it contains the 'samba deadlock' fix.
Especially because in the official Ubuntu bug report ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980 ) the kernel v3.19 from Vivid has been never mentioned.

It is definitely awesome and honourably dedicated, that you want to write a script to test the bug occurrence.
But right now, we don't even know, what exact combination of circumstances let break the kernel in our live systems.
The mentioned:
Quote
It's easily reproducible by running the following commands in the Samba master branch:
./configure.developer TDB_NO_FSYNC=1 make -j test FAIL_IMMEDIATELY=1 SOCKET_WRAPPER_KEEP_PCAP=1 TESTS="samba3.raw.composite"
Seems to be for me nothing, that you can bring and execute on a fully configured productive system.
I think it's more something you test on a system in combination with the relevant kernel and your own samba built. 
And to produce with a script as many concurrent samba access simultaneously by different sessions is just a theory for right now.
 
However thank you for posting your uptimes here.
(also to Carlos alias pcready.cl)

Update:
I compared the change logs for both Ubuntu kernels
http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_3.13.0-85.129/changelog
That's for Kernel v3.13 (Ubuntu 14.04 LTS, which is definitely containing the fix assigned to LaunchPad ID: 1543980 => https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980 )
The changelog is containing:
Quote
  * af_unix: Guard against other == sk in unix_dgram_sendmsg
    - LP: #1543980, #1557191

And one time for kernel v3.19 (our kernel used in Zentyal 4.2) in build 3.19.0-58.64~14.04.1
http://changelogs.ubuntu.com/changelogs/pool/main/l/linux-lts-vivid/linux-lts-vivid_3.19.0-58.64~14.04.1/changelog
is containing as well:
Quote
  * af_unix: Guard against other == sk in unix_dgram_sendmsg
    - LP: #1556297
So obviously the fix has been merged back by the ubuntu maintainers to kernel v3.19

So the kernel version 3.19.0-58 should fix the 'samba deadlock' alias 'soft lockup - CPU #1' bug and should be safe to use!!! :)  + ;D + 8)
(3.19.0-56 is not ... because the fix was integrated in Ubuntu's internal build of kernel 3.19.0-57, probably a test build)

Cheers,
Andreas
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 18, 2016, 03:54:04 pm
Update:
I compared the change logs for both Ubuntu kernels
http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_3.13.0-85.129/changelog
That's for Kernel v3.13 (Ubuntu 14.04 LTS, which is definitely containing the fix assigned to LaunchPad ID: 1543980 => https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980 )
The changelog is containing:
Quote
  * af_unix: Guard against other == sk in unix_dgram_sendmsg
    - LP: #1543980, #1557191

And one time for kernel v3.19 (our kernel used in Zentyal 4.2) in build 3.19.0-58.64~14.04.1
http://changelogs.ubuntu.com/changelogs/pool/main/l/linux-lts-vivid/linux-lts-vivid_3.19.0-58.64~14.04.1/changelog
is containing as well:
Quote
  * af_unix: Guard against other == sk in unix_dgram_sendmsg
    - LP: #1556297
So obviously the fix has been merged back by the ubuntu maintainers to kernel v3.19

So the kernel version 3.19.0-58 should fix the 'samba deadlock' alias 'soft lockup - CPU #1' bug and should be safe to use!!! :)  + ;D + 8)
(3.19.0-56 is not ... because the fix was integrated in Ubuntu's internal build of kernel 3.19.0-57, probably a test build)

Cheers,
Andreas

Ok, i'll take this for gold, i'll jump B to the .58 tonight (if nothing occurs meanwhile).

Thanks Andreas for everything so far!

I'll keep You up to date

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on April 19, 2016, 04:12:37 am
SERVER 15 USERS SAMBA - ZENTYAL 4.2
Code: [Select]
root@servet:~# dmidecode | grep "^System Information" -A8
System Information
        Manufacturer: HP
        Product Name: ProLiant ML150 G6
        Version: 1.0
        Serial Number: MXS108003W
        UUID: 745FC10B-XXXX-DF11-XXXX-C192EAA48B93
        Wake-up Type: Power Switch
        SKU Number: 466132-001
        Family: ProLiant Server

root@servet:~# uname -a
Linux servet 3.19.0-58-generic #64~14.04.1-Ubuntu SMP Fri Mar 18 19:05:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

root@servet:~# uptime
 12:18:20 up 6 days, 20:01,  1 user,  load average: 0,18, 0,25, 0,21

FIREWALL NON SAMBA USERS - ZENTYAL 4.2
Code: [Select]
root@servpcr-fw:~# dmidecode | grep "^System Information" -A8
System Information
        Manufacturer: HP
        Product Name: ProLiant ML110 G5
        Version:      NA
        Serial Number: MX2014011G
        UUID: 44F48208-XXXX-5606-XXXX-560649F92209
        Wake-up Type: Power Switch
        SKU Number: AT040A
        Family: 1234567890

root@servpcr-fw:~# uname -a
Linux servpcr-fw 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

root@servpcr-fw:~# uptime
 12:19:19 up 8 days, 12:18,  1 user,  load average: 0,29, 0,36, 1,38

SERVER 15 USERS SAMBA - ZENTYAL 4.2
Code: [Select]
root@servet:~# dmidecode | grep "^System Information" -A8
System Information
        Manufacturer: HP
        Product Name: ProLiant ML150 G6
        Version: 1.0
        Serial Number: MXS108003W
        UUID: 745FC10B-XXXX-DF11-XXXX-C192EAA48B93
        Wake-up Type: Power Switch
        SKU Number: 466132-001
        Family: ProLiant Server

root@servet:~# uname -a
Linux servet 3.19.0-58-generic #64~14.04.1-Ubuntu SMP Fri Mar 18 19:05:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

root@servet:~# uptime
 23:10:45 up 11 days,  6:54,  1 user,  load average: 0,00, 0,01, 0,05

FIREWALL NON SAMBA USERS - ZENTYAL 4.2
Code: [Select]
root@servpcr-fw:~# dmidecode | grep "^System Information" -A8
System Information
        Manufacturer: HP
        Product Name: ProLiant ML110 G5
        Version:      NA
        Serial Number: MX2014011G
        UUID: 44F48208-XXXX-5606-XXXX-560649F92209
        Wake-up Type: Power Switch
        SKU Number: AT040A
        Family: 1234567890

root@servpcr-fw:~# uname -a
Linux servpcr-fw 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

root@servpcr-fw:~# uptime
 23:12:13 up 12 days, 23:11,  1 user,  load average: 2,69, 2,14, 2,03
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on April 21, 2016, 10:21:50 pm
SERVER 15 USERS SAMBA - ZENTYAL 4.2
Code: [Select]
root@servet:~# dmidecode | grep "^System Information" -A8
System Information
        Manufacturer: HP
        Product Name: ProLiant ML150 G6
        Version: 1.0
        Serial Number: MXS108003W
        UUID: 745FC10B-XXXX-DF11-XXXX-C192EAA48B93
        Wake-up Type: Power Switch
        SKU Number: 466132-001
        Family: ProLiant Server

CRASH YESTERDAY!!!!! with kernel .58  :'(
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 21, 2016, 11:35:03 pm
damn....
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: jwilliams1976 on April 21, 2016, 11:42:13 pm
That sucks. I just upgraded to that -58 kernel.  :'(
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 21, 2016, 11:43:21 pm
That sucks. I just upgraded to that -58 kernel.  :'(

Yeah, me too...
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: Andreas Wirth on April 22, 2016, 09:06:46 am
Hi guys,

I am very sorry to hear that.
Especially because I also invested so much of my own time and effort into the research towards that fu**ing sh*t problem.

I evaluated quickly again both of Ubuntu's kernel change-logs, towards the kernel build 3.13.0-85.129 and 3.19.0-58.64 and there seems to be not much difference.
Both kernel builds are containing the bug fix:
Quote
  * af_unix: Guard against other == sk in unix_dgram_sendmsg
    - LP: #1543980, #1557191
visible in kernel build of 3.13.0-85.129 with its Launchpad ID 1543980, referring exactly to bug ID -> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980 and describing in there exactly the problem we have with Zentyal 4.2 and now it is even marked with status "Fix released"

Zentyal doesn't build their own kernel out of the available Linux kernel source, e.g. opposite to Linux distributors like Debian, Ubuntu, Red Hat/Fedora, Suse etc.
They are using instead the kernel build directly delivered from Ubuntu in their systems.

I am still running on kernel 3.19.0-47, but already reconfigured the grub bootloader to use the newest kernel image again, so 3.19.0-58, on the next reboot.
But I couldn't restart the system yet.

So basically, that means the fix doesn't work. (At least not for us!!!)
Has anybody an Ubuntu Launchpad login, who had been running kernel 3.19.0-58?
So please check in your /var/log/syslog produced by running kernel 3.19.0-58 contains something similar like:
Code: [Select]
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [smbd:18232]
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Modules linked in: xt_mac xt_mark xt_connmark iptable_mangle quota_v2 quota_tree xt_tcpudp xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev serio_raw i2c_piix4 pvpanic 8250_fintek parport_pc mac_hid ppdev lp parport hid_generic usbhid hid psmouse floppy pata_acpi
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] CPU: 1 PID: 18232 Comm: smbd Not tainted 3.19.0-51-generic #58~14.04.1-Ubuntu
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] task: ffff8802153493a0 ti: ffff8801f9208000 task.ti: ffff8801f9208000
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RIP: 0010:[<ffffffff8105b966>]  [<ffffffff8105b966>] native_safe_halt+0x6/0x10
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RSP: 0018:ffff8801f920bd78  EFLAGS: 00000206
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RAX: 0000000000000037 RBX: 0000000000000085 RCX: 0000000000000001
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RDX: 0000000000000000 RSI: 000000000000011e RDI: ffff88021fff5040
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RBP: ffff8801f920bd78 R08: 0000000001451d64 R09: ffff8801f920bc14
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] R10: ffff8801f920bee2 R11: 0000000000000005 R12: ffffffff811f9b4b
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] R13: ffff8801f920bd18 R14: 0000000000000006 R15: 00000000ffffff9c
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] FS:  00007f1004190780(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] CR2: 000055f4a9127c50 CR3: 00000001135f0000 CR4: 00000000000406e0
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Stack:
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024]  ffff8801f920bdc8 ffffffff8105b46b 000000000000008e 0000011e1385c8b0
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024]  ffff8801f920be48 ffff8801f3369680 ffff8801f920bec0 ffff8800d2ec4000
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024]  0000000000000028 ffff8800da2a7480 ffff8801f920be48 ffffffff8105a711
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Call Trace:
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024]  [<ffffffff8105b46b>] kvm_lock_spinning+0xbb/0x1b0
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024]  [<ffffffff8105a711>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024]  [<ffffffff817b74c6>] ? _raw_spin_lock+0x56/0x60
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024]  [<ffffffff81747f7c>] ? unix_state_double_lock+0x2c/0x70
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024]  [<ffffffff8174a803>] unix_dgram_connect+0x93/0x250
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024]  [<ffffffff8168ecf7>] SYSC_connect+0xe7/0x120
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024]  [<ffffffff8168fede>] SyS_connect+0xe/0x10
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024]  [<ffffffff817b788d>] system_call_fastpath+0x16/0x1b
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108032] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [smbd:18232]
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108042] Modules linked in: xt_mac xt_mark xt_connmark iptable_mangle quota_v2 quota_tree xt_tcpudp xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev serio_raw i2c_piix4 pvpanic 8250_fintek parport_pc mac_hid ppdev lp parport hid_generic usbhid hid psmouse floppy pata_acpi
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] CPU: 1 PID: 18232 Comm: smbd Tainted: G             L 3.19.0-51-generic #58~14.04.1-Ubuntu
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] task: ffff8802153493a0 ti: ffff8801f9208000 task.ti: ffff8801f9208000
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RIP: 0010:[<ffffffff8105b966>]  [<ffffffff8105b966>] native_safe_halt+0x6/0x10
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RSP: 0018:ffff8801f920bd78  EFLAGS: 00000206
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RAX: 0000000000000037 RBX: 0000000000000085 RCX: 0000000000000001
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RDX: 0000000000000000 RSI: 000000000000011e RDI: ffff88021fff5040
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RBP: ffff8801f920bd78 R08: 0000000001452860 R09: ffff8801f920bc14
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] R10: ffff8801f920bee2 R11: 0000000000000005 R12: ffffffff811f9b4b
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] R13: ffff8801f920bd18 R14: 0000000000000006 R15: 00000000ffffff9c
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] FS:  00007f1004190780(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] CR2: 000055f4a9127c50 CR3: 00000001135f0000 CR4: 00000000000406e0
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Stack:
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066]  ffff8801f920bdc8 ffffffff8105b46b 000000000000008e 0000011e1385c8b0
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066]  ffff8801f920be48 ffff8801f3369680 ffff8801f920bec0 ffff8800d2ec4000
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066]  0000000000000028 ffff8800da2a7480 ffff8801f920be48 ffffffff8105a711
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Call Trace:
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066]  [<ffffffff8105b46b>] kvm_lock_spinning+0xbb/0x1b0
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066]  [<ffffffff8105a711>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066]  [<ffffffff817b74c6>] ? _raw_spin_lock+0x56/0x60
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066]  [<ffffffff81747f7c>] ? unix_state_double_lock+0x2c/0x70
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066]  [<ffffffff8174a803>] unix_dgram_connect+0x93/0x250
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066]  [<ffffffff8168ecf7>] SYSC_connect+0xe7/0x120
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066]  [<ffffffff8168fede>] SyS_connect+0xe/0x10
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066]  [<ffffffff817b788d>] system_call_fastpath+0x16/0x1b
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004047] INFO: rcu_sched self-detected stall on CPU { 1}  (t=15000 jiffies g=634702 c=634701 q=0)
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] Task dump for CPU 1:
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] smbd            R  running task        0 18232  18216 0x00000008
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  ffffffff81c56040 ffff88021fd03d78 ffffffff8109ff86 0000000000000001
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  ffffffff81c56040 ffff88021fd03d98 ffffffff810a355d 0000000000000087
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  0000000000000002 ffff88021fd03dc8 ffffffff810d3dd0 ffff88021fd14bc0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] Call Trace:
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  <IRQ>  [<ffffffff8109ff86>] sched_show_task+0xb6/0x130
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff810a355d>] dump_cpu_task+0x3d/0x50
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff810d3dd0>] rcu_dump_cpu_stacks+0x90/0xd0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff810d7c8c>] rcu_check_callbacks+0x42c/0x670
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff810a4590>] ? account_process_tick+0x60/0x180
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff810dcb89>] update_process_times+0x39/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff810ec085>] tick_sched_handle.isra.16+0x25/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff810ec104>] tick_sched_timer+0x44/0x80
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff810dd857>] __run_hrtimer+0x77/0x1d0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff810ec0c0>] ? tick_sched_handle.isra.16+0x60/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff810ddc37>] hrtimer_interrupt+0xe7/0x220
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff8104ab19>] local_apic_timer_interrupt+0x39/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff817ba905>] smp_apic_timer_interrupt+0x45/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff817b893d>] apic_timer_interrupt+0x6d/0x80
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  <EOI>  [<ffffffff8105b966>] ? native_safe_halt+0x6/0x10
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff8101e329>] ? sched_clock+0x9/0x10
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff8105b46b>] kvm_lock_spinning+0xbb/0x1b0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff8105a711>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff817b74c6>] ? _raw_spin_lock+0x56/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff81747f7c>] ? unix_state_double_lock+0x2c/0x70
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff8174a803>] unix_dgram_connect+0x93/0x250
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff8168ecf7>] SYSC_connect+0xe7/0x120
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff8168fede>] SyS_connect+0xe/0x10
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059]  [<ffffffff817b788d>] system_call_fastpath+0x16/0x1b
and report this bug then back to the Ubuntu maintainers for kernel 3.19.0-58.64 alias LTS 14.04.1 as reoccurred and "reopen" the Launchpad ticket (assumed to be in LTS kernel 3.13.0-85 as well).
in here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980
The only way to go ... I suppose...  :'(

I am really sorry :( and angry, because it shouldn't be the task of the Zentyal users to do that!!!  >:(

Update:
But if this line above or something similar is not contained in /var/log/syslog and the system crashes, then Zentyal probably has a new issue.
Could it be, that it also has something to do with the used samba version in Zentyal 4.2 in combination with the v. 3.19 kernel branch?

Update2:
@phaidros
if you are still looking in this forum thread, then towards your comment:
Quote
This kernel helped me: linux-image-generic-lts-xenial.

Code: [Select]
apt-get install linux-image-generic-lts-xenial

Running 4.4.0.13.7 since ~2 weeks with no crashes.

hth,
.phai

Are you still using the new 16.04 LTS and to that time not officially released Ubuntu xenial kernel branch in version 4.4?
Did you experience any issues towards your Zentyal 4.2 system setup on top?
If not, which Zentyal components are you using? (Samba Domain Controller, File Server, Email + OpenChange environment etc.)
If it would be matching with my environment setup, I'd consider to upgrade to that kernel branch as well... I mean Ubuntu LTS 16.04 is released since today anyway... and Zentyal will probably switch to the new Ubuntu LTS foundation soon as well.

Have much thx in advance...
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 26, 2016, 11:00:36 am
Guys!

1 component update: Domain Controller and File Sharing, from 4.2.2 to 4.2.3...

SHOULD WE TRUST?

Would it fix our issues?

Opinions?

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on April 26, 2016, 05:34:57 pm
Guys!

1 component update: Domain Controller and File Sharing, from 4.2.2 to 4.2.3...

SHOULD WE TRUST?

Would it fix our issues?

Opinions?

L

Zentyal DEV, what changes had this update? 4.2.2 to 4.2.3...
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on April 27, 2016, 09:50:08 pm
Ok,

I had an issue this afternoon on the server with the older kernel:

Server  |   Kernel  |   Uptime  |   load average  |    Samba load
A   |   3.19.0-51-generic   |   ??:??:?? up 16 days, 15:06   |   0.??, 0.??, 0.??   |   High

Had to reboot 3 times in order to make it working again.

I can understand that zentyal developers wants us to buy their stuff...but this way is really a mess.. If You don't want to keep developing at least advertise ppl about it
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: Andreas Wirth on May 02, 2016, 04:07:51 am
Hey LaM,

affected kernels:
Quote
linux-image-3.19.0-49 (e.g. https://tracker.zentyal.org/issues/4977 , in this forum thread and by my own experience )
linux-image-3.19.0-51 (in this forum thread and by my own experience)
linux-image-3.19.0-56 (in this forum thread and by my own experience)
and Carlos is currently 'long-term' ;) testing linux-image-3.19.0-58, and finding indirectly out for us, if it is safe to switch back to the main kernel upstream.
and as experienced by Carlos linux-image-3.19.0-58 has still issues...

I am running since 6 weeks with 3.19.0-47 with no crash:
Code: [Select]
██ root@dcrc-dcx1:~
██ 10:03:06 ᛤ  uptime
 10:03:12 up 20 days,  3:16,  1 user,  load average: 0.02, 0.06, 0.10
Here only displayed 20 days uptime, because I had to reboot 3 weeks ago because of a incoming samba and openchange update, which caused MS Outlook connectivity problems, but the Domain Controller and file-server itself was fine.

Cheers,
Andreas
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on May 02, 2016, 10:04:10 am
Hey Andreas,

thx for the info.
Right now I have C on .56 that's running since 35 days and the other two with kernel .58 running since 11 and 4 days. The latter one has the samba module updated.

I'll keep You up-to-date...let's hope for the best... =)

Still would like to hear something from devs...at least a sign of life...

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on May 05, 2016, 03:03:03 pm
Sigh...after 38 days I had to restart C due to proc 9 to 100%...she is on kernel 3.19.0.56...i'll update her either and we'll see.

The other 2 are on kernel .58 and up since 7 and 14 days.

I'll keep You updated

L
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on May 09, 2016, 04:50:10 pm
Ok..even better. I cannot create a new share (or assign permissions) via interface because samba module's state is unknown.

Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on May 12, 2016, 04:15:35 am
Hello, someone has tried?

Just I install it on a server to see how it works.

Code: [Select]
linux-image-3.19.0-59-generic amd64 3.19.0-59.65~14.04.1 [16.8 MB]
 :-\
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: LaM on May 12, 2016, 11:25:28 am
Hi,

no I think I will try this weekend to backup and update all 3.
Uptime by now are 14 days, 21 days and 6 days (  :'( still on .56 and it was up since 38 days sigh)

I'll keep You updated asap
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: kinetica on May 16, 2016, 05:56:56 pm
Hi Everybody,
I am following this post because as you can see from the beginning, I had the same problem. I downgraded to 3.19.0-43-generic and since then no more issue, (up-time 40 days). The latest available kernel to download is  linux-image-3.19.0-59-generic.. I didn't try to upgrade because this is a production server.
I do not understand how is possible that a bug like this (looks like a severe bug), is still reproducing itself in so many kernel minor versions, and the is not yet fixed.....

Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: segelfreak on June 23, 2016, 07:49:12 pm
3.19.0.61.44 released now. Has anyone had a chance to try, yet?
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: dumarjo on November 14, 2016, 03:10:02 pm
Any update on this ?

Regards
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: pcready.cl on November 14, 2016, 05:49:58 pm
Any update on this ?

Regards

Code: [Select]
root@servet:~# uname -a
Linux servet 3.19.0-69-generic #77~14.04.1-Ubuntu SMP Tue Aug 30 01:29:21 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

This kernel version has not given me any problems.
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: segelfreak on November 14, 2016, 08:03:40 pm

Code: [Select]
Linux zentyal 4.4.0-45-generic #66~14.04.1-Ubuntu SMP Wed Oct 19 15:05:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Works too!
Title: Re: Zentyal 4.2 - BUG: soft lockup - CPU #1, after latest update
Post by: dumarjo on November 15, 2016, 04:36:34 pm

Code: [Select]
root@servet:~# uname -a
Linux servet 3.19.0-69-generic #77~14.04.1-Ubuntu SMP Tue Aug 30 01:29:21 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

This kernel version has not given me any problems.

adminserver@dc01:~$ uname -a
Linux dc01 3.19.0-69-generic #77~14.04.1-Ubuntu SMP Tue Aug 30 01:29:21 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
adminserver@dc01:~$ samba --version
Version 4.3.4-Zentyal

I was on this version and I see the problem from time to time.

Jonathan