Hi guys,
I am very sorry to hear that.
Especially because I also invested so much of my own time and effort into the research towards that fu**ing sh*t problem.
I evaluated quickly again both of Ubuntu's kernel change-logs, towards the kernel build
3.13.0-85.129 and
3.19.0-58.64 and there seems to be not much difference.
Both kernel builds are containing the bug fix:
* af_unix: Guard against other == sk in unix_dgram_sendmsg
- LP: #1543980, #1557191
visible in kernel build of 3.13.0-85.129 with its Launchpad ID
1543980, referring exactly to bug ID ->
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980 and describing in there exactly the problem we have with Zentyal 4.2 and now it is even marked with status "Fix released"
Zentyal doesn't build their own kernel out of the available Linux kernel source, e.g. opposite to Linux distributors like Debian, Ubuntu, Red Hat/Fedora, Suse etc.
They are using instead the kernel build directly delivered from Ubuntu in their systems.
I am still running on kernel 3.19.0-47, but already reconfigured the grub bootloader to use the newest kernel image again, so 3.19.0-58, on the next reboot.
But I couldn't restart the system yet.
So basically, that means the fix doesn't work. (At least not for us!!!)
Has anybody an Ubuntu Launchpad login, who had been running kernel 3.19.0-58?
So please check in your /var/log/syslog produced by running kernel 3.19.0-58 contains something similar like:
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [smbd:18232]
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Modules linked in: xt_mac xt_mark xt_connmark iptable_mangle quota_v2 quota_tree xt_tcpudp xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev serio_raw i2c_piix4 pvpanic 8250_fintek parport_pc mac_hid ppdev lp parport hid_generic usbhid hid psmouse floppy pata_acpi
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] CPU: 1 PID: 18232 Comm: smbd Not tainted 3.19.0-51-generic #58~14.04.1-Ubuntu
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] task: ffff8802153493a0 ti: ffff8801f9208000 task.ti: ffff8801f9208000
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RIP: 0010:[<ffffffff8105b966>] [<ffffffff8105b966>] native_safe_halt+0x6/0x10
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RSP: 0018:ffff8801f920bd78 EFLAGS: 00000206
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RAX: 0000000000000037 RBX: 0000000000000085 RCX: 0000000000000001
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RDX: 0000000000000000 RSI: 000000000000011e RDI: ffff88021fff5040
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] RBP: ffff8801f920bd78 R08: 0000000001451d64 R09: ffff8801f920bc14
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] R10: ffff8801f920bee2 R11: 0000000000000005 R12: ffffffff811f9b4b
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] R13: ffff8801f920bd18 R14: 0000000000000006 R15: 00000000ffffff9c
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] FS: 00007f1004190780(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] CR2: 000055f4a9127c50 CR3: 00000001135f0000 CR4: 00000000000406e0
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Stack:
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] ffff8801f920bdc8 ffffffff8105b46b 000000000000008e 0000011e1385c8b0
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] ffff8801f920be48 ffff8801f3369680 ffff8801f920bec0 ffff8800d2ec4000
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] 0000000000000028 ffff8800da2a7480 ffff8801f920be48 ffffffff8105a711
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Call Trace:
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff8105b46b>] kvm_lock_spinning+0xbb/0x1b0
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff8105a711>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff817b74c6>] ? _raw_spin_lock+0x56/0x60
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff81747f7c>] ? unix_state_double_lock+0x2c/0x70
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff8174a803>] unix_dgram_connect+0x93/0x250
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff8168ecf7>] SYSC_connect+0xe7/0x120
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff8168fede>] SyS_connect+0xe/0x10
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] [<ffffffff817b788d>] system_call_fastpath+0x16/0x1b
Feb 29 13:17:32 dcrc-dcx1 kernel: [65244.108024] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108032] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [smbd:18232]
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108042] Modules linked in: xt_mac xt_mark xt_connmark iptable_mangle quota_v2 quota_tree xt_tcpudp xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_h323 nf_conntrack_h323 nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev serio_raw i2c_piix4 pvpanic 8250_fintek parport_pc mac_hid ppdev lp parport hid_generic usbhid hid psmouse floppy pata_acpi
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] CPU: 1 PID: 18232 Comm: smbd Tainted: G L 3.19.0-51-generic #58~14.04.1-Ubuntu
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] task: ffff8802153493a0 ti: ffff8801f9208000 task.ti: ffff8801f9208000
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RIP: 0010:[<ffffffff8105b966>] [<ffffffff8105b966>] native_safe_halt+0x6/0x10
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RSP: 0018:ffff8801f920bd78 EFLAGS: 00000206
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RAX: 0000000000000037 RBX: 0000000000000085 RCX: 0000000000000001
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RDX: 0000000000000000 RSI: 000000000000011e RDI: ffff88021fff5040
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] RBP: ffff8801f920bd78 R08: 0000000001452860 R09: ffff8801f920bc14
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] R10: ffff8801f920bee2 R11: 0000000000000005 R12: ffffffff811f9b4b
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] R13: ffff8801f920bd18 R14: 0000000000000006 R15: 00000000ffffff9c
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] FS: 00007f1004190780(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] CR2: 000055f4a9127c50 CR3: 00000001135f0000 CR4: 00000000000406e0
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Stack:
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] ffff8801f920bdc8 ffffffff8105b46b 000000000000008e 0000011e1385c8b0
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] ffff8801f920be48 ffff8801f3369680 ffff8801f920bec0 ffff8800d2ec4000
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] 0000000000000028 ffff8800da2a7480 ffff8801f920be48 ffffffff8105a711
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Call Trace:
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff8105b46b>] kvm_lock_spinning+0xbb/0x1b0
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff8105a711>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff817b74c6>] ? _raw_spin_lock+0x56/0x60
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff81747f7c>] ? unix_state_double_lock+0x2c/0x70
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff8174a803>] unix_dgram_connect+0x93/0x250
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff8168ecf7>] SYSC_connect+0xe7/0x120
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff8168fede>] SyS_connect+0xe/0x10
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] [<ffffffff817b788d>] system_call_fastpath+0x16/0x1b
Feb 29 13:18:00 dcrc-dcx1 kernel: [65272.108066] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004047] INFO: rcu_sched self-detected stall on CPU { 1} (t=15000 jiffies g=634702 c=634701 q=0)
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] Task dump for CPU 1:
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] smbd R running task 0 18232 18216 0x00000008
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] ffffffff81c56040 ffff88021fd03d78 ffffffff8109ff86 0000000000000001
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] ffffffff81c56040 ffff88021fd03d98 ffffffff810a355d 0000000000000087
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] 0000000000000002 ffff88021fd03dc8 ffffffff810d3dd0 ffff88021fd14bc0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] Call Trace:
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] <IRQ> [<ffffffff8109ff86>] sched_show_task+0xb6/0x130
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810a355d>] dump_cpu_task+0x3d/0x50
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810d3dd0>] rcu_dump_cpu_stacks+0x90/0xd0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810d7c8c>] rcu_check_callbacks+0x42c/0x670
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810a4590>] ? account_process_tick+0x60/0x180
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810dcb89>] update_process_times+0x39/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810ec085>] tick_sched_handle.isra.16+0x25/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810ec104>] tick_sched_timer+0x44/0x80
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810dd857>] __run_hrtimer+0x77/0x1d0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810ec0c0>] ? tick_sched_handle.isra.16+0x60/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff810ddc37>] hrtimer_interrupt+0xe7/0x220
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8104ab19>] local_apic_timer_interrupt+0x39/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff817ba905>] smp_apic_timer_interrupt+0x45/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff817b893d>] apic_timer_interrupt+0x6d/0x80
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] <EOI> [<ffffffff8105b966>] ? native_safe_halt+0x6/0x10
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8101e329>] ? sched_clock+0x9/0x10
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8105b46b>] kvm_lock_spinning+0xbb/0x1b0
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8105a711>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff817b74c6>] ? _raw_spin_lock+0x56/0x60
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff81747f7c>] ? unix_state_double_lock+0x2c/0x70
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8174a803>] unix_dgram_connect+0x93/0x250
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8168ecf7>] SYSC_connect+0xe7/0x120
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff8168fede>] SyS_connect+0xe/0x10
Feb 29 13:18:07 dcrc-dcx1 kernel: [65279.004059] [<ffffffff817b788d>] system_call_fastpath+0x16/0x1b
and report this bug then back to the Ubuntu maintainers for kernel 3.19.0-58.64 alias LTS 14.04.1 as reoccurred and "reopen" the Launchpad ticket (assumed to be in LTS kernel 3.13.0-85 as well).
in here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980The only way to go ... I suppose...
I am really sorry
and angry, because it shouldn't be the task of the Zentyal users to do that!!!
Update:But if this line above or something similar is not contained in /var/log/syslog and the system crashes, then Zentyal probably has a new issue.
Could it be, that it also has something to do with the used samba version in Zentyal 4.2 in combination with the v. 3.19 kernel branch?
Update2:@phaidrosif you are still looking in this forum thread, then towards your comment:
This kernel helped me: linux-image-generic-lts-xenial.
apt-get install linux-image-generic-lts-xenial
Running 4.4.0.13.7 since ~2 weeks with no crashes.
hth,
.phai
Are you still using the new 16.04 LTS and to that time not officially released Ubuntu xenial kernel branch in version 4.4?
Did you experience any issues towards your Zentyal 4.2 system setup on top?
If not, which Zentyal components are you using? (Samba Domain Controller, File Server, Email + OpenChange environment etc.)
If it would be matching with my environment setup, I'd consider to upgrade to that kernel branch as well... I mean Ubuntu LTS 16.04 is released since today anyway... and Zentyal will probably switch to the new Ubuntu LTS foundation soon as well.
Have much thx in advance...