[CentOS] system hangs

Mon Feb 24 15:52:42 UTC 2014
Connie Sieh <csieh at fnal.gov>

On Mon, 24 Feb 2014, m.roth at 5-cent.us wrote:

> Every so often, one of our servers will go into what I can only describe
> as an undefined state: it pings, but there's zero access - you can't ssh
> in, and if I go plug a keyboard and monitor into the server itself, you
> can see the monitor's live, it's not the "monitor turned off" color, but
> there is zero response to the keyboard. The upshot is that I wind up
> having to power cycle it.
>
> Well, it just happened again on one of our servers Friday evening, as I
> found this morning. Looking at the logs this morning, I see that sar last
> shows
> 10:20:01 PM       all     34.38      0.00      8.29      0.00      0.00
> 57.33
>
> On of my users dropped me an email at 22:45 that it was "off", and the
> last things I see in /var/log/messages are one of those annoying
> Feb 21 22:26:23 <server> kernel: INFO: task perl:20596 blocked for more
> than 120 seconds.
> Feb 21 22:26:23 <server> kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>
> I also see
> Feb 21 22:26:23 <server> kernel: perl          D ffffffff80158250     0
> 20596  20557
> which, as I just found by googling perl NOTLD, means that this is in a
> kernel uninterruptable state
> In addition, in the stack trace, some nfs messages
> Feb 21 22:26:23 <server> kernel:  [<ffffffff886b58d1>]
> :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
>
> So, it *appears* to be either an NFS issue, or a NIC issue. The user's
> home directory server is CentOS running 6.5, and the server that hung is
> 5.10. Mount on the formerly hung server, su-d to his account shows merely
> nfs, so I'm guessing it's NFS3. Looking at lsmod and /var/log/dmesg, I see
> it's running the tg3 NIC driver.
>
> Anyone else seeing this, and if so, any thoughts on the matter? Note that
> I've had this on Penguins, which are all Supermicro, and they're using the
> igb NIC driver, but the one this past weekend is a Dell, so it's not just
> one system.
>
>
>        mark
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>

What CPU's do these systems have?  AMD or Intel.

What kernel are the server and client running?

-Connie Sieh