Feb 14 14:00:01 zentyal CRON[23986]: (clamav) CMD (/usr/bin/freshclam --quiet)
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [smbd:23984]
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] Modules linked in: xt_mark xt_connmark iptable_mangle 8021q garp mrp stp llc quota_v2 quota_tree ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_tcpudp xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack amdkfd amd_iommu_v2 snd_hda_codec_analog snd_hda_codec_generic radeon hp_wmi snd_hda_intel sparse_keymap ppdev snd_hda_controller snd_hda_codec ttm snd_hwdep drm_kms_helper snd_pcm snd_timer drm kvm edac_core snd i2c_algo_bit soundcore shpchp serio_raw k8temp edac_mce_amd 8250_fintek wmi i2c_piix4 tpm_infineon parport_pc mac_hid lp parport uas usb_storage hid_generic usbhid hid psmouse 3c59x mii floppy tg3 ahci libahci ptp pps_core
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] CPU: 0 PID: 23984 Comm: smbd Not tainted 3.19.0-49-generic #55~14.04.1-Ubuntu
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] Hardware name: Hewlett-Packard HP Compaq dc5850 Microtower/3029h, BIOS 786F6 v01.09 04/09/2008
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] task: ffff8800693a93a0 ti: ffff88006bd30000 task.ti: ffff88006bd30000
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] RIP: 0010:[<ffffffff817b77f5>] [<ffffffff817b77f5>] _raw_spin_lock+0x35/0x60
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] RSP: 0018:ffff88006bd33e20 EFLAGS: 00000206
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] RAX: 0000000000003db1 RBX: ffff88000897d0c0 RCX: 00000000000001d2
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] RDX: 00000000000001d4 RSI: 00000000000001d2 RDI: ffff8800691b6120
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] RBP: ffff88006bd33e48 R08: 00000000000001d4 R09: ffff88006bd33c14
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] R10: ffff88006bd33ee2 R11: 0000000000000005 R12: ffff88002031a870
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] R13: 00000000000000a2 R14: 0000000400000001 R15: ffff88002031a870
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] FS: 00007fef5ef10780(0000) GS:ffff88006fc00000(0000) knlGS:0000000000000000
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] CR2: 00007f51e9e72000 CR3: 0000000014ac3000 CR4: 00000000000007f0
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] Stack:
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] ffffffff817484a0 ffff8800691b4000 ffff8800691b5e00 ffff8800200ff480
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] ffff8800691b4000 ffff88006bd33ea8 ffffffff8174b643 ffff88006bd33e88
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] ffffffff81cda080 ffff88006bd33e78 00000028200ff480 000000000000006e
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] Call Trace:
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] [<ffffffff817484a0>] ? unix_state_double_lock+0x60/0x70
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] [<ffffffff8174b643>] unix_dgram_connect+0x93/0x250
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] [<ffffffff8168f367>] SYSC_connect+0xe7/0x120
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] [<ffffffff8169054e>] SyS_connect+0xe/0x10
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] [<ffffffff817b7c0d>] system_call_fastpath+0x16/0x1b
Feb 14 14:00:08 zentyal kernel: [ 7928.088009] Code: f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 01 c3 89 d1 0f b7 f2 b8 00 80 00 00 eb 0a 0f 1f 00 f3 90 83 e8 01 74 20 0f b7 17 41 89 d0 <41> 31 c8 41 81 e0 fe ff 00 00 75 e7 55 0f b7 f2 48 89 e5 e8 6b
Feb 14 14:00:30 zentyal dhcpd: DHCPDISCOVER from 00:1e:0b:80:82:21 (mint) via eth1
Feb 15 01:59:58 zentyal kernel: [ 8836.084003] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [smbd:6522]
Feb 15 01:59:58 zentyal kernel: [ 8836.084005] Modules linked in: quota_v2 quota_tree ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_tcpudp xt_conntrack iptable_nat nf_nat_ipv4 iptable_filter nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp xt_mark nf_conntrack_ipv4 nf_defrag_ipv4 xt_connmark nf_conntrack iptable_mangle ip_tables x_tables snd_hda_codec_analog snd_hda_codec_generic amdkfd snd_hda_intel hp_wmi sparse_keymap snd_hda_controller amd_iommu_v2 ppdev radeon snd_hda_codec snd_hwdep snd_pcm snd_timer ttm kvm drm_kms_helper drm serio_raw snd edac_core k8temp soundcore edac_mce_amd i2c_algo_bit parport_pc i2c_piix4 wmi shpchp 8250_fintek tpm_infineon mac_hid lp parport uas usb_storage hid_generic usbhid hid psmouse tg3 3c59x ptp mii pps_core floppy ahci libahci
Feb 15 01:59:58 zentyal kernel: [ 8836.084005] CPU: 0 PID: 6522 Comm: smbd Tainted: G L 3.19.0-49-generic #55~14.04.1-Ubuntu
Feb 15 01:59:58 zentyal kernel: [ 8836.084005] Hardware name: Hewlett-Packard HP Compaq dc5850 Microtower/3029h, BIOS 786F6 v01.09 04/09/2008
Feb 15 01:59:58 zentyal kernel: [ 8836.084005] task: ffff880069554e80 ti: ffff88006b930000 task.tFeb 17 19:24:18 zentyal rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="450" x-info="http://www.rsyslog.com"] start
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] INFO: rcu_sched self-detected stall on CPU { 1} (t=15000 jiffies g=68817 c=68816 q=0)
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] Task dump for CPU 1:
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] smbd R running task 0 18719 5097 0x00000008
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] ffffffff81c56000 ffff88006fc83d78 ffffffff810a0276 0000000000000001
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] ffffffff81c56000 ffff88006fc83d98 ffffffff810a386d 0000000000000087
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] 0000000000000002 ffff88006fc83dc8 ffffffff810d4100 ffff88006fc94bc0
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] Call Trace:
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] <IRQ> [<ffffffff810a0276>] sched_show_task+0xb6/0x130
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff810a386d>] dump_cpu_task+0x3d/0x50
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff810d4100>] rcu_dump_cpu_stacks+0x90/0xd0
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff810d7fbc>] rcu_check_callbacks+0x42c/0x670
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff810a48a1>] ? account_process_tick+0x61/0x180
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff810dcef9>] update_process_times+0x39/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff810ec405>] tick_sched_handle.isra.16+0x25/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff810ec484>] tick_sched_timer+0x44/0x80
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff810ddbb7>] __run_hrtimer+0x77/0x1d0
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff810ec440>] ? tick_sched_handle.isra.16+0x60/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff810ddf97>] hrtimer_interrupt+0xe7/0x220
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff8104abc9>] local_apic_timer_interrupt+0x39/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff817bac85>] smp_apic_timer_interrupt+0x45/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff817b8cbd>] apic_timer_interrupt+0x6d/0x80
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] <EOI> [<ffffffff817b77ea>] ? _raw_spin_lock+0x2a/0x60
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff817484a0>] ? unix_state_double_lock+0x60/0x70
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff8174b643>] unix_dgram_connect+0x93/0x250
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff8168f367>] SYSC_connect+0xe7/0x120
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff8169054e>] SyS_connect+0xe/0x10
Feb 17 21:59:41 zentyal kernel: [ 9345.516006] [<ffffffff817b7c0d>] system_call_fastpath+0x16/0x1b
Howdy,
just to be clear about this: do you have the vmWare Tools installed?
Regards
Thomas
sudo apt-get purge linux-image-3.19.0-49-generic
sudo update-grub
Hi everyone.
Experienced the same problem on two different servers. I had 2 lockups on the first one. After some research, I decided to downgrade the kernel from 3.19.0-49-generic to 3.19.0-47-generic. So far, no more lockups.
Today I experienced same behaviour on another server. Checked kernel version, and it was 3.19.0-49-generic. Just downgraded too this one to 3.19.0-47-generic.
I'll keep you informed about results. First server has not lockup since downgrade.
Both Zentyal 4.2.2 up to date.
How to downgrade:Code: [Select]sudo apt-get purge linux-image-3.19.0-49-generic
sudo update-grub
then reboot.
I may just add that one should ensure that the previous kernel is still "available". Auto-remove function of apt might have deleted it, no?
And finally, you need to put upgrade offers for the new kernel on hold...
Anyway, i'm curious about what would happen if we try to remove the last kernel...
Hi everyone.
Experienced the same problem on two different servers. I had 2 lockups on the first one. After some research, I decided to downgrade the kernel from 3.19.0-49-generic to 3.19.0-47-generic. So far, no more lockups.
Today I experienced same behaviour on another server. Checked kernel version, and it was 3.19.0-49-generic. Just downgraded too this one to 3.19.0-47-generic.
I'll keep you informed about results. First server has not lockup since downgrade.
Both Zentyal 4.2.2 up to date.
How to downgrade:Code: [Select]sudo apt-get purge linux-image-3.19.0-49-generic
sudo update-grub
then reboot.
samba-common-bin:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
samba-common:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
samba-dsdb-modules:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
samba-libs:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
samba-vfs-modules:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
samba:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1),
smbclient:amd64 (4.3.1-zentyal2, 4.3.4-zentyal1)
cd /var/lib/mysql/zentyal/
myisamchk -r -v -f samba_access.MYD <-- If I do remember correctly won't work on this one since my issues were on index
myisamchk -r -v -f samba_access.MYI
sudo dpkg --configure -a
sudo reboot
dpkg --configure -a
top - 11:34:30 up 17:58, 2 users, load average: 0.82, 0.34, 0.15
Tasks: 444 total, 2 running, 442 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.0 us, 1.3 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 16185268 total, 15717188 used, 468080 free, 374772 buffers
KiB Swap: 16544764 total, 0 used, 16544764 free. 13313988 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19904 ebox 20 0 326012 14052 11768 R 99.9 0.1 1:13.49 net
linux-generic Complete Generic Linux kernel and headers 3.13.0.79.85
linux-headers-generic Generic Linux kernel headers 3.13.0.79.85
linux-image-generic Generic Linux kernel image 3.13.0.79.85
linux-image-generic-lts-vivid Generic Linux kernel image 3.19.0.51.36
linux-source Linux kernel source with Ubuntu patches 3.13.0.79.85
linux-source-3.13.0 Linux kernel source for version 3.13.0 with Ubuntu patches 3.13.0-79.123
Hi
Is there any news about this issue? I resently updated my Zentyal and I have same problems. Today I found this topic and I downgraded my kernel to 3.19.0-43 I hope it helps. But has someone reported this bug? Can we watch it and look, when it is fixed?
I have also Proxmox where inside running Zentyal Samba server.
3.19.0-47
But with that was in first day big crash - so I updated again and get 51. Now I installed 43 kernel - right now its fine - but with 51 kernel it was also two days fine and then was one cores fully loaded.
Ok. Let's asume 3.19.0-79 solves the problem.not sure where you got 3.19.0-79 kernel i though they where only up to 3.19.0-51
Will "apt-get upgrade" update the kernel after we did a rollback...? Were kernel updates freezed with the rollback we did...? How to force kernel update back to normal?
Cannot find anything...
Thanks.
OK Guys,I have been on the Linux 3.19.0-47-generic since feb 25 2016 and definitely no problem.
one machine again presented the issue.
I think I'll need to roll back. SIGH
I'll keep U up-to-date...
I am running now 12 days already GNU/Linux 3.19.0-43-generic x86_64 kernel without that issue.
New kernel (generic, 3.13.0.83.89) is out...any infos about it?
QuoteNew kernel (generic, 3.13.0.83.89) is out...any infos about it?
You should be on a 3.19 kernel if you are on zentyal 4.2
If I where you I would stay on the Linux 3.19.0-47-generic kernel for a few months if you want stability
There is the option to go to a 4.2 kernel that is in wily ubuntu15.10
see
https://wiki.ubuntu.com/Kernel/LTSEnablementStack
hope that helps
Linux iklii 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Mar 23 10:59:16 iklii kernel: [155292.064007] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [smbd:7866]
Mar 23 10:59:16 iklii kernel: [155292.064007] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_tcpudp xt_conntrack iptable_nat nf_nat_ipv4 iptable_filter nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp xt_mac xt_mark nf_conntrack_ipv4 nf_defrag_ipv4 xt_connmark nf_conntrack iptable_mangle ip_tables x_tables quota_v2 quota_tree joydev hid_generic usbhid hid ppdev serio_raw pvpanic 8250_fintek parport_pc i2c_piix4 cirrus ttm drm_kms_helper drm syscopyarea sysfillrect sysimgblt mac_hid lp parport floppy psmouse pata_acpi
Mar 23 10:59:16 iklii kernel: [155292.064007] CPU: 1 PID: 7866 Comm: smbd Tainted: G L 3.19.0-56-generic #62~14.04.1-Ubuntu
Mar 23 10:59:16 iklii kernel: [155292.064007] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
Mar 23 10:59:16 iklii kernel: [155292.064007] task: ffff8800adaf44b0 ti: ffff880006520000 task.ti: ffff880006520000
Mar 23 10:59:16 iklii kernel: [155292.064007] RIP: 0010:[<ffffffff817b8858>] [<ffffffff817b8858>] _raw_spin_lock+0x28/0x60
Mar 23 10:59:16 iklii kernel: [155292.064007] RSP: 0018:ffff880006523e20 EFLAGS: 00000206
Mar 23 10:59:16 iklii kernel: [155292.064007] RAX: 00000000000066fc RBX: ffff88005c2f8240 RCX: 00000000000016a0
Mar 23 10:59:16 iklii kernel: [155292.064007] RDX: 00000000000016aa RSI: 00000000000016a0 RDI: ffff8800a6a39d60
Mar 23 10:59:16 iklii kernel: [155292.064007] RBP: ffff880006523e48 R08: 000000000000000a R09: ffff880006523c14
Mar 23 10:59:16 iklii kernel: [155292.064007] R10: ffff880006523ee2 R11: 0000000000000004 R12: ffff88011af15c88
Mar 23 10:59:16 iklii kernel: [155292.064007] R13: 0000000000000088 R14: 0000000400000001 R15: ffff88011af15c88
Mar 23 10:59:16 iklii kernel: [155292.064007] FS: 00007f9ef5ac8780(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
Mar 23 10:59:16 iklii kernel: [155292.064007] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 23 10:59:16 iklii kernel: [155292.064007] CR2: 000056157be51e78 CR3: 000000001676b000 CR4: 00000000000006e0
Mar 23 10:59:16 iklii kernel: [155292.064007] Stack:
Mar 23 10:59:16 iklii kernel: [155292.064007] ffffffff817491cc ffff8800ad8f1680 ffff880006523ec0 ffff88001244d400
Mar 23 10:59:16 iklii kernel: [155292.064007] ffff8800ad8f1680 ffff880006523ea8 ffffffff8174b713 ffff880006523e88
Mar 23 10:59:16 iklii kernel: [155292.064007] ffffffff81cd9fc0 ffff880006523e78 000000271244d400 000000000000006e
Mar 23 10:59:16 iklii kernel: [155292.064007] Call Trace:
Mar 23 10:59:16 iklii kernel: [155292.064007] [<ffffffff817491cc>] ? unix_state_double_lock+0x2c/0x70
Mar 23 10:59:16 iklii kernel: [155292.064007] [<ffffffff8174b713>] unix_dgram_connect+0x93/0x250
Mar 23 10:59:16 iklii kernel: [155292.064007] [<ffffffff8168fde7>] SYSC_connect+0xe7/0x120
Mar 23 10:59:16 iklii kernel: [155292.064007] [<ffffffff81690fce>] SyS_connect+0xe/0x10
Mar 23 10:59:16 iklii kernel: [155292.064007] [<ffffffff817b8c4d>] system_call_fastpath+0x16/0x1b
Mar 23 10:59:16 iklii kernel: [155292.064007] Code: 00 00 00 0f 1f 44 00 00 b8 00 00 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 01 c3 89 d1 0f b7 f2 b8 00 80 00 00 eb 0a 0f 1f 00 <f3> 90 83 e8 01 74 20 0f b7 17 41 89 d0 41 31 c8 41 81 e0 fe ff
I have created a bug on bugtraker
https://tracker.zentyal.org/issues/4977
QuoteOk. Let's asume 3.19.0-79 solves the problem.not sure where you got 3.19.0-79 kernel i though they where only up to 3.19.0-51
Will "apt-get upgrade" update the kernel after we did a rollback...? Were kernel updates freezed with the rollback we did...? How to force kernel update back to normal?
Cannot find anything...
Thanks.
apt-get upgrade will not upgrade kernels.
You need apt-get dist-upgrade to upgrade kernel.
If you edited /etc/default/grub as in my previous post .that will keep booting by default the kernel you set there.
If you purged the faulty kernel then yes if you dist-upgrade then it will boot with newer kernel that may be installed
If you hold down the shift key when booting you should get the grub boot screen and be able to choose different kernels
Hi LaM,
our server was up-to-date with kernel: 3.19.0-56-generic, when We start to notice this bug
For us was necessary to downgrade to linux-image-3.19.0-43-generic:
Our procedure was:
-- Check kernel running = uname -r
-- Check the firmware installed with = dpkg --list | grep linux-image
-- Check file present in the partition = /boot/
-- Install the old firmware = apt-get install linux-image-3.19.0-43-generic
-- Check all firmware installed again = dpkg --list | grep linux-image
-- Modify grab = /etc/default/grub
# GRUB_DEFAULT=0
GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 3.19.0-43-generic"
-- Reboot
-- Check kernel running = uname -r
Since then Samba and server are running smooth ;)
apt-get install linux-image-generic-lts-xenial
FYI
This is a kernel bug https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785. You can test whether it's fixed by running the command: 'ip rule show' It should just spit out the rules and exit but on any versions with the bug it just loops and never exits. Zentyal must use this command somewhere and after a while it eats up all CPU and memory resources and results in the CPU soft hang.
Quick way to test it instead of waiting a week for Zentyal to crap out.
QuoteFYI
This is a kernel bug https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785. You can test whether it's fixed by running the command: 'ip rule show' It should just spit out the rules and exit but on any versions with the bug it just loops and never exits. Zentyal must use this command somewhere and after a while it eats up all CPU and memory resources and results in the CPU soft hang.
Quick way to test it instead of waiting a week for Zentyal to crap out.
Not sure about this .I tested this against know bad kernel linux-image-3.19.0-49-generic.And it did not produce any problems when running ip rule show.
What kernel are you running now
@jwilliams1976:
Na' sorry, I don't think, that your mentioned bug https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785 has got anything to do with it.It might be a bug to keep an eye on, hopefully we don't get affected as well. (Don't need another one!)
- There's nowhere mentioned, that a CPU soft lockup is occurring
- There's only mentioned, that it messes up the rules table, which of course might be fatal and messing up the system's operational status as well
I do believe this bug is related to samba (smbd) in combination with the kernel. (I bet if you turn off smbd, the bug disappears)
But it is occurring in and affecting obviously several kernel versions:
E.g. for the kernel 3.13.0-77:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980
But also for the kernel in UCS, what I mentioned before, for the kernel 4.1.16 in this bug:
https://forge.univention.org/bugzilla/show_bug.cgi?id=40558
So I still better stick to 3.19.0.47 in Zentyal for the moment, which seems to do the job for now... until somebody confirms that 3.19.0-58 is working properly for him/her.
Or the proper quick test to confirm, that the bug is gone. Like I mentioned before, for me it always took a couple of days, 6 usually in average, until the system crashed.
And running with 3.19.0.47, I realise, that the system frees memory from time to time (e.g. over night), instead of putting continuously on top, until this CPU lockup occurs and the killing of processes starts.
(Sorry our system is productive, and I can't mess around with it... anymore)
But please keep your experiences up2date here in this thread, if you've got a test system running, that reproduces this bug.
Have much thanks to everybody in advance...
@Carlos: Is your system still running alright with 3.19.0-58? Please keep us up2date...
[update]
Obviously Fedora 23 with kernel 4.4.3 runs into the same bug, reported by this user running on a cubietruck system:
http://www.cubieforums.com/index.php?topic=4076.0
But he or she restricts its occurrence to a high network IO in general via 'smb, scp, or rsync over ssh', but on the opposite the CPU lockup is always logged towards a smbd process.
root@servet:~# dmidecode | grep "^System Information" -A8
System Information
Manufacturer: HP
Product Name: ProLiant ML150 G6
Version: 1.0
Serial Number: MXS108003W
UUID: 745FC10B-XXXX-DF11-XXXX-C192EAA48B93
Wake-up Type: Power Switch
SKU Number: 466132-001
Family: ProLiant Server
root@servet:~# uname -a
Linux servet 3.19.0-58-generic #64~14.04.1-Ubuntu SMP Fri Mar 18 19:05:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@servet:~# uptime
12:35:34 up 3 days, 20:19, 1 user, load average: 0,03, 0,10, 0,08
root@servpcr-fw:~# dmidecode | grep "^System Information" -A8
System Information
Manufacturer: HP
Product Name: ProLiant ML110 G5
Version: NA
Serial Number: MX2014011G
UUID: 44F48208-XXXX-5606-XXXX-560649F92209
Wake-up Type: Power Switch
SKU Number: AT040A
Family: 1234567890
root@servpcr-fw:~# uname -a
Linux servpcr-fw 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@servpcr-fw:~# uptime
12:40:31 up 5 days, 12:40, 1 user, load average: 0,16, 0,36, 0,31
pcready.cl - does you have virtualized servers? Mainly have her problems when Zentyal is running in VPS - at least my server is virtualized.
Nice...so 58 looks stable...
But waiting for the issue to come...isn't there a way to force the issue?
L
That's my point. I would like to find a way to reproduce the issue in order to be sure that is gone from the installed kernel.
Waiting is not the correct option imo. It doesn't give You the assurance that the kernel is bug-free
E.g. mine run with .51 and .56 and had been well for days...more than a week (and then one started to crush...)
Honestly I'm still trying to figure how to reproduce it. Looks latched to some concurrency with samba's calls...but i'm not sure.
I'll update You all asa i've more infos...
L
It stems from a bug in the kernel that makes the ip command output the first rule infinitely. You can use this command to see if you're affected:
ip route ls
Broken Output:
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
<repeats indefinitely - ctrl+c to quit>
In Zentyal, this causes one of the network scripts to hang because it's waiting for that command to end. This prevents loading of other services and resulted in my network being severely broken.
Besides the previously mentioned fix of rolling back the kernel, you can modify the script in question:
/usr/share/zentyal-network/flush-fwmarks
That's my point. I would like to find a way to reproduce the issue in order to be sure that is gone from the installed kernel.
Waiting is not the correct option imo. It doesn't give You the assurance that the kernel is bug-free
E.g. mine run with .51 and .56 and had been well for days...more than a week (and then one started to crush...)
Honestly I'm still trying to figure how to reproduce it. Looks latched to some concurrency with samba's calls...but i'm not sure.
I'll update You all asa i've more infos...
L
My servers are production can not come and change the kernel, leave you with these versions and hope for the best lol.
I will report any errors here in the forum.
The 'ip route ls' command I mentioned earlier has worked for me to test that the bug exists or does not in a given kernel. See this post for more info:
https://forum.zentyal.org/index.php/topic,26954.msg99367.html#msg99367 (https://forum.zentyal.org/index.php/topic,26954.msg99367.html#msg99367)QuoteIt stems from a bug in the kernel that makes the ip command output the first rule infinitely. You can use this command to see if you're affected:
ip route ls
Broken Output:
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
<repeats indefinitely - ctrl+c to quit>
In Zentyal, this causes one of the network scripts to hang because it's waiting for that command to end. This prevents loading of other services and resulted in my network being severely broken.
Besides the previously mentioned fix of rolling back the kernel, you can modify the script in question:
/usr/share/zentyal-network/flush-fwmarks
.56 Kernel which has no active users samba, only used as Firewall, perhaps why it has not failed.I suppose, yes you are right with that suggestion: no active users = no crash
Broken Output:But both kernel versions, 3.19.0-49 and 3.19.0-56, according to this forum thread are confirmed to be affected by this CPU soft lockup bug in combination and syslog-ed towards the smbd process.
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
0: from all lookup local
<repeats indefinitely - ctrl+c to quit>
PBR NOT working on:Yes, there was obviously a problem with ip in the previous Zentyal/Ubuntu release.
3.13.0-69
3.13.0-70
3.16.0-52 <used by Zentyal 4.1>
3.16.0-53 <used by Zentyal 4.1>
3.19.0-37 <probably initially used by Zentyal 4.2>
Not sure about this .I tested this against know bad kernel linux-image-3.19.0-49-generic.And it did not produce any problems when running ip rule show.
What kernel are you running now
use the command on the kernel linux-image-3.19.0-56-generic and nothing happened, and that is an affected version according to the forums... ???
██ root@dcrc-dcx1:/var/log
██ 13:31:03 ᛤ dpkg --list | grep linux-image
rc linux-image-3.19.0-25-generic 3.19.0-25.26~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
rc linux-image-3.19.0-39-generic 3.19.0-39.44~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
rc linux-image-3.19.0-41-generic 3.19.0-41.46~14.04.2 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
rc linux-image-3.19.0-42-generic 3.19.0-42.48~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-43-generic 3.19.0-43.49~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-47-generic 3.19.0-47.53~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-49-generic 3.19.0-49.55~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-51-generic 3.19.0-51.58~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-56-generic 3.19.0-56.62~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii linux-image-3.19.0-58-generic 3.19.0-58.64~14.04.1 amd64 Linux kernel image for version 3.19.0 on 64 bit x86 SMP
rc linux-image-extra-3.19.0-25-generic 3.19.0-25.26~14.04.1 amd64 Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
rc linux-image-extra-3.19.0-39-generic 3.19.0-39.44~14.04.1 amd64 Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
rc linux-image-extra-3.19.0-41-generic 3.19.0-41.46~14.04.2 amd64 Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
rc linux-image-extra-3.19.0-42-generic 3.19.0-42.48~14.04.1 amd64 Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
rc linux-image-extra-3.19.0-43-generic 3.19.0-43.49~14.04.1 amd64 Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii linux-image-extra-3.19.0-47-generic 3.19.0-47.53~14.04.1 amd64 Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii linux-image-extra-3.19.0-49-generic 3.19.0-49.55~14.04.1 amd64 Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii linux-image-extra-3.19.0-51-generic 3.19.0-51.58~14.04.1 amd64 Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii linux-image-extra-3.19.0-56-generic 3.19.0-56.62~14.04.1 amd64 Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii linux-image-extra-3.19.0-58-generic 3.19.0-58.64~14.04.1 amd64 Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
ii linux-image-generic-lts-vivid 3.19.0.58.41 amd64 Generic Linux kernel image
██ root@dcrc-dcx1:/var/log
██ 13:31:11 ᛤ
[..] but on any versions with the bug it just loops and never exits. Zentyal must use this command somewhere and after a while it eats up all CPU and memory resources and results in the CPU soft hang.and:
Quick way to test it instead of waiting a week for Zentyal to crap out.
The 'ip route ls' command I mentioned earlier has worked for me to test that the bug exists or does not in a given kernel.
Na' sorry, I don't think, that your mentioned bug https://bugs.launchpad.net/ubuntu/+source/linux-lts-utopic/+bug/1514785 has got anything to do with it.
v3.14: 9d054f57adc981a5f503d5eb9b259aa450b90dc5I don't know, if fixed, but irrelevant for us.
v3.12: 9964b4c4ee925b2910723e509abd7241cff1ef84
v3.10: da8db0830a2ce63f628150307a01a315f5081202
ckt/linux-3.13.y: 6505b15f7f7efde1853b5a7641e9ce675c2b1a96
v3.4: -
v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
Reverting the patch "unix: avoid use-after-free in ep_remove_wait_queue"
in 4.1 fixes my problem (for now). The original patch went into 4.4, but
was back-ported to several stable trees:
v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
v3.18: 72032798034d921ed565e3bf8dfdc3098f6473e2
v4.1: 5c77e26862ce604edea05b3442ed765e9756fe0f
v4.2: bad967fdd8ecbdd171f5f243657be033d2d081a7
v4.3: 58a6a46a036ce81a2a8ecaa6fc1537c894349e3f
v4.4: 7d267278a9ece963d77eefec61630223fce08c6c
Rainer Weikusat sent a patch named
[PATCH net] af_unix: Guard against other == sk in unix_dgram_sendmsg
< https://patchwork.ozlabs.org/patch/582017/ >
which fixes the problem.
For our distribution we released chose to revert the original patch as
we needed a working kernel as fast as possible, as several of our
customers were hit by that bug.
I tested the patch from Rainer and it also made the bug disappear.
David Miller also picked the patch for stable and we will do the same
when next be build a new kernel for our release.
Philipp
For 4.4 it is in review right now for 4.4.4 as announced by greg k-hI didn't find any information, if this fix has found its way into the 4.4.4 kernel version, but if it is contained and Fedora in current version 23 considers to update from kernel 4.4.3 to 4.4.4, that will make this Cubietruck user very happy again: http://www.cubieforums.com/index.php?topic=4076.0
yesterday: < https://lkml.org/lkml/2016/3/1/828 >
Thanks Philipp!So it basically took the Ubuntu maintainers quite a while (almost 2 months) to officially fix it in their 14.04 LTS kernel in version 3.13.0-xx.
I just hope to trigger some reaction from the ubuntu maintainers
in order get a usable kernel more than two week after breaking it.
It's easily reproducible by running the following commands in the Samba master branch:
./configure.developer TDB_NO_FSYNC=1 make -j test FAIL_IMMEDIATELY=1 SOCKET_WRAPPER_KEEP_PCAP=1 TESTS="samba3.raw.composite"
root@servet:~# dmidecode | grep "^System Information" -A8
System Information
Manufacturer: HP
Product Name: ProLiant ML150 G6
Version: 1.0
Serial Number: MXS108003W
UUID: 745FC10B-XXXX-DF11-XXXX-C192EAA48B93
Wake-up Type: Power Switch
SKU Number: 466132-001
Family: ProLiant Server
root@servet:~# uname -a
Linux servet 3.19.0-58-generic #64~14.04.1-Ubuntu SMP Fri Mar 18 19:05:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@servet:~# uptime
12:18:20 up 6 days, 20:01, 1 user, load average: 0,18, 0,25, 0,21
root@servpcr-fw:~# dmidecode | grep "^System Information" -A8
System Information
Manufacturer: HP
Product Name: ProLiant ML110 G5
Version: NA
Serial Number: MX2014011G
UUID: 44F48208-XXXX-5606-XXXX-560649F92209
Wake-up Type: Power Switch
SKU Number: AT040A
Family: 1234567890
root@servpcr-fw:~# uname -a
Linux servpcr-fw 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@servpcr-fw:~# uptime
12:19:19 up 8 days, 12:18, 1 user, load average: 0,29, 0,36, 1,38
Server | | | Kernel | | | Uptime | | | load average | | | Samba load |
A | | | 3.19.0-51-generic | | | 06:14:14 up 6 days, 19:52 | | | 0.00, 0.01, 0.05 | | | High |
B | | | 3.19.0-51-generic | | | 06:14:16 up 5 days, 17:28 | | | 0.04, 0.07, 0.12 | | | mid/low |
C | | | 3.19.0-56-generic | | | 06:14:19 up 21 days, 8:10 | | | 0.09, 0.11, 0.10 | | | mid/high |
@LaM and @BerT666:
I don't think, that that can or should be the test to find out, if a kernel-version is affected by this bug or not. :-\
- data-transfer of 500 GB up to 1 TB...
- or instruct all users to put as much operation on it as possible at the same time...
That's IT technically destructive for the own reputation, "Oh yeah, please help me to crash the server"
I would love to have verified:
- that the bug is gone
- and finding a quick test to verify, if a system is affected by this bug or not
The bug can be reproduced and confirmed, but obviously only as developer on an affected system (To be honest I don't know how to do it):QuoteIt's easily reproducible by running the following commands in the Samba master branch:
./configure.developer TDB_NO_FSYNC=1 make -j test FAIL_IMMEDIATELY=1 SOCKET_WRAPPER_KEEP_PCAP=1 TESTS="samba3.raw.composite"
v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a2.) but then also identified by and reported to the Ubuntu maintainers in Feb 2016:
v3.18: 72032798034d921ed565e3bf8dfdc3098f6473e2
v4.1: 5c77e26862ce604edea05b3442ed765e9756fe0f
v4.2: bad967fdd8ecbdd171f5f243657be033d2d081a7
v4.3: 58a6a46a036ce81a2a8ecaa6fc1537c894349e3f
v3.14: 9d054f57adc981a5f503d5eb9b259aa450b90dc5v3.19 is not mentioned or contained!!!
v3.12: 9964b4c4ee925b2910723e509abd7241cff1ef84
v3.10: da8db0830a2ce63f628150307a01a315f5081202
ckt/linux-3.13.y: 6505b15f7f7efde1853b5a7641e9ce675c2b1a96
v3.4: -
v3.2: a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
It's easily reproducible by running the following commands in the Samba master branch:Seems to be for me nothing, that you can bring and execute on a fully configured productive system.
./configure.developer TDB_NO_FSYNC=1 make -j test FAIL_IMMEDIATELY=1 SOCKET_WRAPPER_KEEP_PCAP=1 TESTS="samba3.raw.composite"
* af_unix: Guard against other == sk in unix_dgram_sendmsg
- LP: #1543980, #1557191
* af_unix: Guard against other == sk in unix_dgram_sendmsgSo obviously the fix has been merged back by the ubuntu maintainers to kernel v3.19
- LP: #1556297
Update:
I compared the change logs for both Ubuntu kernels
http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_3.13.0-85.129/changelog
That's for Kernel v3.13 (Ubuntu 14.04 LTS, which is definitely containing the fix assigned to LaunchPad ID: 1543980 => https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980 )
The changelog is containing:Quote* af_unix: Guard against other == sk in unix_dgram_sendmsg
- LP: #1543980, #1557191
And one time for kernel v3.19 (our kernel used in Zentyal 4.2) in build 3.19.0-58.64~14.04.1
http://changelogs.ubuntu.com/changelogs/pool/main/l/linux-lts-vivid/linux-lts-vivid_3.19.0-58.64~14.04.1/changelog
is containing as well:Quote* af_unix: Guard against other == sk in unix_dgram_sendmsgSo obviously the fix has been merged back by the ubuntu maintainers to kernel v3.19
- LP: #1556297
So the kernel version 3.19.0-58 should fix the 'samba deadlock' alias 'soft lockup - CPU #1' bug and should be safe to use!!! :) + ;D + 8)
(3.19.0-56 is not ... because the fix was integrated in Ubuntu's internal build of kernel 3.19.0-57, probably a test build)
Cheers,
Andreas
SERVER 15 USERS SAMBA - ZENTYAL 4.2Code: [Select]root@servet:~# dmidecode | grep "^System Information" -A8
System Information
Manufacturer: HP
Product Name: ProLiant ML150 G6
Version: 1.0
Serial Number: MXS108003W
UUID: 745FC10B-XXXX-DF11-XXXX-C192EAA48B93
Wake-up Type: Power Switch
SKU Number: 466132-001
Family: ProLiant Server
root@servet:~# uname -a
Linux servet 3.19.0-58-generic #64~14.04.1-Ubuntu SMP Fri Mar 18 19:05:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@servet:~# uptime
12:18:20 up 6 days, 20:01, 1 user, load average: 0,18, 0,25, 0,21
FIREWALL NON SAMBA USERS - ZENTYAL 4.2Code: [Select]root@servpcr-fw:~# dmidecode | grep "^System Information" -A8
System Information
Manufacturer: HP
Product Name: ProLiant ML110 G5
Version: NA
Serial Number: MX2014011G
UUID: 44F48208-XXXX-5606-XXXX-560649F92209
Wake-up Type: Power Switch
SKU Number: AT040A
Family: 1234567890
root@servpcr-fw:~# uname -a
Linux servpcr-fw 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@servpcr-fw:~# uptime
12:19:19 up 8 days, 12:18, 1 user, load average: 0,29, 0,36, 1,38
root@servet:~# dmidecode | grep "^System Information" -A8
System Information
Manufacturer: HP
Product Name: ProLiant ML150 G6
Version: 1.0
Serial Number: MXS108003W
UUID: 745FC10B-XXXX-DF11-XXXX-C192EAA48B93
Wake-up Type: Power Switch
SKU Number: 466132-001
Family: ProLiant Server
root@servet:~# uname -a
Linux servet 3.19.0-58-generic #64~14.04.1-Ubuntu SMP Fri Mar 18 19:05:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@servet:~# uptime
23:10:45 up 11 days, 6:54, 1 user, load average: 0,00, 0,01, 0,05
root@servpcr-fw:~# dmidecode | grep "^System Information" -A8
System Information
Manufacturer: HP
Product Name: ProLiant ML110 G5
Version: NA
Serial Number: MX2014011G
UUID: 44F48208-XXXX-5606-XXXX-560649F92209
Wake-up Type: Power Switch
SKU Number: AT040A
Family: 1234567890
root@servpcr-fw:~# uname -a
Linux servpcr-fw 3.19.0-56-generic #62~14.04.1-Ubuntu SMP Fri Mar 11 11:03:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@servpcr-fw:~# uptime
23:12:13 up 12 days, 23:11, 1 user, load average: 2,69, 2,14, 2,03
root@servet:~# dmidecode | grep "^System Information" -A8
System Information
Manufacturer: HP
Product Name: ProLiant ML150 G6
Version: 1.0
Serial Number: MXS108003W
UUID: 745FC10B-XXXX-DF11-XXXX-C192EAA48B93
Wake-up Type: Power Switch
SKU Number: 466132-001
Family: ProLiant Server
That sucks. I just upgraded to that -58 kernel. :'(
* af_unix: Guard against other == sk in unix_dgram_sendmsgvisible in kernel build of 3.13.0-85.129 with its Launchpad ID 1543980, referring exactly to bug ID -> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980 and describing in there exactly the problem we have with Zentyal 4.2 and now it is even marked with status "Fix released"
- LP: #1543980, #1557191
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [smbd:18232]
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Modules linked in: xt_mac xt_mark xt_connmark iptable_mangle quota_v2 quota_tree xt_tcpudp xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev serio_raw i2c_piix4 pvpanic 8250_fintek parport_pc mac_hid ppdev lp parport hid_generic usbhid hid psmouse floppy pata_acpi
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] CPU: 1 PID: 18232 Comm: smbd Not tainted 3.19.0-51-generic #58~14.04.1-Ubuntu
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] task: ffff8802153493a0 ti: ffff8801f9208000 task.ti: ffff8801f9208000
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RIP: 0010:[<ffffffff8105b966>] [<ffffffff8105b966>] native_safe_halt+0x6/0x10
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RSP: 0018:ffff8801f920bd78 EFLAGS: 00000206
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RAX: 0000000000000037 RBX: 0000000000000085 RCX: 0000000000000001
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RDX: 0000000000000000 RSI: 000000000000011e RDI: ffff88021fff5040
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RBP: ffff8801f920bd78 R08: 0000000001451d64 R09: ffff8801f920bc14
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] R10: ffff8801f920bee2 R11: 0000000000000005 R12: ffffffff811f9b4b
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] R13: ffff8801f920bd18 R14: 0000000000000006 R15: 00000000ffffff9c
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] FS: 00007f1004190780(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] CR2: 000055f4a9127c50 CR3: 00000001135f0000 CR4: 00000000000406e0
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Stack:
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] ffff8801f920bdc8 ffffffff8105b46b 000000000000008e 0000011e1385c8b0
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] ffff8801f920be48 ffff8801f3369680 ffff8801f920bec0 ffff8800d2ec4000
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] 0000000000000028 ffff8800da2a7480 ffff8801f920be48 ffffffff8105a711
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Call Trace:
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff8105b46b>] kvm_lock_spinning+0xbb/0x1b0
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff8105a711>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff817b74c6>] ? _raw_spin_lock+0x56/0x60
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff81747f7c>] ? unix_state_double_lock+0x2c/0x70
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff8174a803>] unix_dgram_connect+0x93/0x250
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff8168ecf7>] SYSC_connect+0xe7/0x120
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff8168fede>] SyS_connect+0xe/0x10
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff817b788d>] system_call_fastpath+0x16/0x1b
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108032] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [smbd:18232]
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108042] Modules linked in: xt_mac xt_mark xt_connmark iptable_mangle quota_v2 quota_tree xt_tcpudp xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev serio_raw i2c_piix4 pvpanic 8250_fintek parport_pc mac_hid ppdev lp parport hid_generic usbhid hid psmouse floppy pata_acpi
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] CPU: 1 PID: 18232 Comm: smbd Tainted: G L 3.19.0-51-generic #58~14.04.1-Ubuntu
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] task: ffff8802153493a0 ti: ffff8801f9208000 task.ti: ffff8801f9208000
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RIP: 0010:[<ffffffff8105b966>] [<ffffffff8105b966>] native_safe_halt+0x6/0x10
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RSP: 0018:ffff8801f920bd78 EFLAGS: 00000206
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RAX: 0000000000000037 RBX: 0000000000000085 RCX: 0000000000000001
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RDX: 0000000000000000 RSI: 000000000000011e RDI: ffff88021fff5040
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RBP: ffff8801f920bd78 R08: 0000000001452860 R09: ffff8801f920bc14
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] R10: ffff8801f920bee2 R11: 0000000000000005 R12: ffffffff811f9b4b
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] R13: ffff8801f920bd18 R14: 0000000000000006 R15: 00000000ffffff9c
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] FS: 00007f1004190780(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] CR2: 000055f4a9127c50 CR3: 00000001135f0000 CR4: 00000000000406e0
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Stack:
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] ffff8801f920bdc8 ffffffff8105b46b 000000000000008e 0000011e1385c8b0
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] ffff8801f920be48 ffff8801f3369680 ffff8801f920bec0 ffff8800d2ec4000
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] 0000000000000028 ffff8800da2a7480 ffff8801f920be48 ffffffff8105a711
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Call Trace:
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff8105b46b>] kvm_lock_spinning+0xbb/0x1b0
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff8105a711>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff817b74c6>] ? _raw_spin_lock+0x56/0x60
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff81747f7c>] ? unix_state_double_lock+0x2c/0x70
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff8174a803>] unix_dgram_connect+0x93/0x250
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff8168ecf7>] SYSC_connect+0xe7/0x120
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff8168fede>] SyS_connect+0xe/0x10
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff817b788d>] system_call_fastpath+0x16/0x1b
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004047] INFO: rcu_sched self-detected stall on CPU { 1} (t=15000 jiffies g=634702 c=634701 q=0)
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] Task dump for CPU 1:
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] smbd R running task 0 18232 18216 0x00000008
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] ffffffff81c56040 ffff88021fd03d78 ffffffff8109ff86 0000000000000001
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] ffffffff81c56040 ffff88021fd03d98 ffffffff810a355d 0000000000000087
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] 0000000000000002 ffff88021fd03dc8 ffffffff810d3dd0 ffff88021fd14bc0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] Call Trace:
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] <IRQ> [<ffffffff8109ff86>] sched_show_task+0xb6/0x130
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810a355d>] dump_cpu_task+0x3d/0x50
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810d3dd0>] rcu_dump_cpu_stacks+0x90/0xd0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810d7c8c>] rcu_check_callbacks+0x42c/0x670
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810a4590>] ? account_process_tick+0x60/0x180
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810dcb89>] update_process_times+0x39/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810ec085>] tick_sched_handle.isra.16+0x25/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810ec104>] tick_sched_timer+0x44/0x80
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810dd857>] __run_hrtimer+0x77/0x1d0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810ec0c0>] ? tick_sched_handle.isra.16+0x60/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810ddc37>] hrtimer_interrupt+0xe7/0x220
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8104ab19>] local_apic_timer_interrupt+0x39/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff817ba905>] smp_apic_timer_interrupt+0x45/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff817b893d>] apic_timer_interrupt+0x6d/0x80
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] <EOI> [<ffffffff8105b966>] ? native_safe_halt+0x6/0x10
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8101e329>] ? sched_clock+0x9/0x10
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8105b46b>] kvm_lock_spinning+0xbb/0x1b0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8105a711>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff817b74c6>] ? _raw_spin_lock+0x56/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff81747f7c>] ? unix_state_double_lock+0x2c/0x70
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8174a803>] unix_dgram_connect+0x93/0x250
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8168ecf7>] SYSC_connect+0xe7/0x120
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8168fede>] SyS_connect+0xe/0x10
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff817b788d>] system_call_fastpath+0x16/0x1b
and report this bug then back to the Ubuntu maintainers for kernel 3.19.0-58.64 alias LTS 14.04.1 as reoccurred and "reopen" the Launchpad ticket (assumed to be in LTS kernel 3.13.0-85 as well).This kernel helped me: linux-image-generic-lts-xenial.Code: [Select]apt-get install linux-image-generic-lts-xenial
Running 4.4.0.13.7 since ~2 weeks with no crashes.
hth,
.phai
Guys!
1 component update: Domain Controller and File Sharing, from 4.2.2 to 4.2.3...
SHOULD WE TRUST?
Would it fix our issues?
Opinions?
L
linux-image-3.19.0-49 (e.g. https://tracker.zentyal.org/issues/4977 , in this forum thread and by my own experience )and as experienced by Carlos linux-image-3.19.0-58 has still issues...
linux-image-3.19.0-51 (in this forum thread and by my own experience)
linux-image-3.19.0-56 (in this forum thread and by my own experience)
and Carlos is currently 'long-term' ;) testing linux-image-3.19.0-58, and finding indirectly out for us, if it is safe to switch back to the main kernel upstream.
██ root@dcrc-dcx1:~
██ 10:03:06 ᛤ uptime
10:03:12 up 20 days, 3:16, 1 user, load average: 0.02, 0.06, 0.10
Here only displayed 20 days uptime, because I had to reboot 3 weeks ago because of a incoming samba and openchange update, which caused MS Outlook connectivity problems, but the Domain Controller and file-server itself was fine.linux-image-3.19.0-59-generic amd64 3.19.0-59.65~14.04.1 [16.8 MB]
Any update on this ?
Regards
root@servet:~# uname -a
Linux servet 3.19.0-69-generic #77~14.04.1-Ubuntu SMP Tue Aug 30 01:29:21 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Linux zentyal 4.4.0-45-generic #66~14.04.1-Ubuntu SMP Wed Oct 19 15:05:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Code: [Select]root@servet:~# uname -a
Linux servet 3.19.0-69-generic #77~14.04.1-Ubuntu SMP Tue Aug 30 01:29:21 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
This kernel version has not given me any problems.