[CentOS] random crashes

Wed Apr 2 20:32:30 UTC 2014
m.roth at 5-cent.us <m.roth at 5-cent.us>

John R Pierce wrote:
> I have a server thats been running fine for a year or two lock  up a few
> times recently, requiring power cycling.
>
> The /var/log/messages after a lockup last night is appended to this
> message.
>
> hardware is a pretty typical server, Supermicro X8DTE-F motherboard,
> dual Xeon X5650, 48GB ECC memory, LSI SAS 2008 for the boot disks, and
> LSI MegaRAID SAS 9261-8i for the data volume.   Lots of 3TB disks in a
> raid60.   Primary application is BackupPC v3.3.0 (from EPEL), it also
> has an NFS export (also used for backup purposes).
>
> Runs CentOS 6.latest (kernel 2.6.32-431.11.2.el6.x86_64).   X is not
> loaded (inittab level 3).     selinux is permissive, iptables is not
> loaded.   this server is on a corporate internal network, 1 Intel 82574L
> NIC configured with static IP, 2nd one is not in use.
>
> any clues what to try?    I'm hesitant to enable irqpoll as I hear that
> it is a real performance sucker.
>
>
<SNIP>
> Apr  1 21:34:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190
> Apr  1 21:34:58 sg1 kernel: [<ffffffff8100b072>]
> system_call_fastpath+0x16/0x1b
> Apr  1 21:36:58 sg1 kernel: INFO: task crond:11598 blocked for more than
> 120 seconds.
> Apr  1 21:36:58 sg1 kernel:      Not tainted 2.6.32-431.11.2.el6.x86_64 #1
> Apr  1 21:36:58 sg1 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Apr  1 21:36:58 sg1 kernel: crond         D 0000000000000008     0
> 11598   7120 0x00000080
> Apr  1 21:36:58 sg1 kernel: ffff88011257bd38 0000000000000086
> 0000000000000000 0000000000000000
> Apr  1 21:36:58 sg1 kernel: 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> Apr  1 21:36:58 sg1 kernel: ffff88063208dab8 ffff88011257bfd8
> 000000000000fbc8 ffff88063208dab8
> Apr  1 21:36:58 sg1 kernel: Call Trace:
> Apr  1 21:36:58 sg1 kernel: [<ffffffff81528dd5>]
> schedule_timeout+0x215/0x2e0
> Apr  1 21:36:58 sg1 kernel: [<ffffffff81330968>] ?
> extract_entropy+0x108/0x1f0
> Apr  1 21:36:58 sg1 kernel: [<ffffffff81528a53>]
> wait_for_common+0x123/0x180
> Apr  1 21:36:58 sg1 kernel: [<ffffffff81065df0>] ?
> default_wake_function+0x0/0x20
> Apr  1 21:36:58 sg1 kernel: [<ffffffff81528b6d>]
> wait_for_completion+0x1d/0x20
> Apr  1 21:36:58 sg1 kernel: [<ffffffff81097108>]
> synchronize_sched+0x58/0x60
> Apr  1 21:36:58 sg1 kernel: [<ffffffff81097090>] ?
> wakeme_after_rcu+0x0/0x20
> Apr  1 21:36:58 sg1 kernel: [<ffffffff812229dc>]
> install_session_keyring_to_cred+0x6c/0xd0
> Apr  1 21:36:58 sg1 kernel: [<ffffffff81222b73>]
> join_session_keyring+0x133/0x160
> Apr  1 21:36:58 sg1 kernel: [<ffffffff810e2057>] ?
> audit_syscall_entry+0x1d7/0x200
> Apr  1 21:36:58 sg1 kernel: [<ffffffff81221778>]
> keyctl_join_session_keyring+0x38/0x70
> Apr  1 21:36:58 sg1 kernel: [<ffffffff812223a0>] sys_keyctl+0x170/0x190
> Apr  1 21:36:58 sg1 kernel: [<ffffffff8100b072>]
> system_call_fastpath+0x16/0x1b
>
> (at 10:57pm, I power cycle it)
> Apr  1 22:57:43 sg1 kernel: imklog 5.8.10, log source = /proc/kmsg
> started.
> Apr  1 22:57:43 sg1 rsyslogd: [origin software="rsyslogd"
> swVersion="5.8.10" x-pid="2232" x-info="http://www.rsyslog.com"] start
> Apr  1 22:57:43 sg1 kernel: Initializing cgroup subsys cpuset
> Apr  1 22:57:43 sg1 kernel: Initializing cgroup subsys cpu
> Apr  1 22:57:43 sg1 kernel: Linux version 2.6.32-431.11.2.el6.x86_64
> (mockbuild at c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red
> Hat 4.4.7-4) (GCC) ) #1 SMP Tue Mar 25 19:59:55 UTC 2014
> Apr  1 22:57:43 sg1 kernel: Command line: ro
> root=/dev/mapper/vg_sg1-lv_root rd_NO_LUKS rd_LVM_LV=vg_sg1/lv_root
> rd_LVM_LV=vg_sg1/lv_swap r
> d_NO_MD quiet SYSFONT=latarcyrheb-sun16 rhgb  KEYBOARDTYPE=pc
> KEYTABLE=us crashkernel=auto rhgb quiet rd_NO_DM LANG=en_US.UTF-8
> ......

I see when it last reported, and I see when you restarted. Could you give
me one more piece of info: do sar for that day: I'm curious if the last
thing reported was 21:30, or if it kept reporting later. That might tell
us if this is when it crashed, or if it was something a bit later that
left no trail.

        mark