John R Pierce wrote:
I have a server thats been running fine for a year or two lock up a few times recently, requiring power cycling.
The /var/log/messages after a lockup last night is appended to this message.
hardware is a pretty typical server, Supermicro X8DTE-F motherboard, dual Xeon X5650, 48GB ECC memory, LSI SAS 2008 for the boot disks, and LSI MegaRAID SAS 9261-8i for the data volume. Lots of 3TB disks in a raid60. Primary application is BackupPC v3.3.0 (from EPEL), it also has an NFS export (also used for backup purposes).
Runs CentOS 6.latest (kernel 2.6.32-431.11.2.el6.x86_64). X is not loaded (inittab level 3). selinux is permissive, iptables is not loaded. this server is on a corporate internal network, 1 Intel 82574L NIC configured with static IP, 2nd one is not in use.
any clues what to try? I'm hesitant to enable irqpoll as I hear that it is a real performance sucker.
<SNIP>
Apr 1 21:34:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:34:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Apr 1 21:36:58 sg1 kernel: INFO: task crond:11598 blocked for more than 120 seconds. Apr 1 21:36:58 sg1 kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Apr 1 21:36:58 sg1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 21:36:58 sg1 kernel: crond D 0000000000000008 0 11598 7120 0x00000080 Apr 1 21:36:58 sg1 kernel: ffff88011257bd38 0000000000000086 0000000000000000 0000000000000000 Apr 1 21:36:58 sg1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Apr 1 21:36:58 sg1 kernel: ffff88063208dab8 ffff88011257bfd8 000000000000fbc8 ffff88063208dab8 Apr 1 21:36:58 sg1 kernel: Call Trace: Apr 1 21:36:58 sg1 kernel: [<ffffffff81528dd5>] schedule_timeout+0x215/0x2e0 Apr 1 21:36:58 sg1 kernel: [<ffffffff81330968>] ? extract_entropy+0x108/0x1f0 Apr 1 21:36:58 sg1 kernel: [<ffffffff81528a53>] wait_for_common+0x123/0x180 Apr 1 21:36:58 sg1 kernel: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 Apr 1 21:36:58 sg1 kernel: [<ffffffff81528b6d>] wait_for_completion+0x1d/0x20 Apr 1 21:36:58 sg1 kernel: [<ffffffff81097108>] synchronize_sched+0x58/0x60 Apr 1 21:36:58 sg1 kernel: [<ffffffff81097090>] ? wakeme_after_rcu+0x0/0x20 Apr 1 21:36:58 sg1 kernel: [<ffffffff812229dc>] install_session_keyring_to_cred+0x6c/0xd0 Apr 1 21:36:58 sg1 kernel: [<ffffffff81222b73>] join_session_keyring+0x133/0x160 Apr 1 21:36:58 sg1 kernel: [<ffffffff810e2057>] ? audit_syscall_entry+0x1d7/0x200 Apr 1 21:36:58 sg1 kernel: [<ffffffff81221778>] keyctl_join_session_keyring+0x38/0x70 Apr 1 21:36:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190 Apr 1 21:36:58 sg1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
(at 10:57pm, I power cycle it) Apr 1 22:57:43 sg1 kernel: imklog 5.8.10, log source = /proc/kmsg started. Apr 1 22:57:43 sg1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="2232" x-info="http://www.rsyslog.com"] start Apr 1 22:57:43 sg1 kernel: Initializing cgroup subsys cpuset Apr 1 22:57:43 sg1 kernel: Initializing cgroup subsys cpu Apr 1 22:57:43 sg1 kernel: Linux version 2.6.32-431.11.2.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Tue Mar 25 19:59:55 UTC 2014 Apr 1 22:57:43 sg1 kernel: Command line: ro root=/dev/mapper/vg_sg1-lv_root rd_NO_LUKS rd_LVM_LV=vg_sg1/lv_root rd_LVM_LV=vg_sg1/lv_swap r d_NO_MD quiet SYSFONT=latarcyrheb-sun16 rhgb KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto rhgb quiet rd_NO_DM LANG=en_US.UTF-8 ......
I see when it last reported, and I see when you restarted. Could you give me one more piece of info: do sar for that day: I'm curious if the last thing reported was 21:30, or if it kept reporting later. That might tell us if this is when it crashed, or if it was something a bit later that left no trail.
mark