On Mon, 24 Feb 2014, m.roth at 5-cent.us wrote: > Every so often, one of our servers will go into what I can only describe > as an undefined state: it pings, but there's zero access - you can't ssh > in, and if I go plug a keyboard and monitor into the server itself, you > can see the monitor's live, it's not the "monitor turned off" color, but > there is zero response to the keyboard. The upshot is that I wind up > having to power cycle it. > > Well, it just happened again on one of our servers Friday evening, as I > found this morning. Looking at the logs this morning, I see that sar last > shows > 10:20:01 PM all 34.38 0.00 8.29 0.00 0.00 > 57.33 > > On of my users dropped me an email at 22:45 that it was "off", and the > last things I see in /var/log/messages are one of those annoying > Feb 21 22:26:23 <server> kernel: INFO: task perl:20596 blocked for more > than 120 seconds. > Feb 21 22:26:23 <server> kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > I also see > Feb 21 22:26:23 <server> kernel: perl D ffffffff80158250 0 > 20596 20557 > which, as I just found by googling perl NOTLD, means that this is in a > kernel uninterruptable state > In addition, in the stack trace, some nfs messages > Feb 21 22:26:23 <server> kernel: [<ffffffff886b58d1>] > :nfs:nfs_wait_bit_uninterruptible+0x0/0xd > > So, it *appears* to be either an NFS issue, or a NIC issue. The user's > home directory server is CentOS running 6.5, and the server that hung is > 5.10. Mount on the formerly hung server, su-d to his account shows merely > nfs, so I'm guessing it's NFS3. Looking at lsmod and /var/log/dmesg, I see > it's running the tg3 NIC driver. > > Anyone else seeing this, and if so, any thoughts on the matter? Note that > I've had this on Penguins, which are all Supermicro, and they're using the > igb NIC driver, but the one this past weekend is a Dell, so it's not just > one system. > > > mark > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > What CPU's do these systems have? AMD or Intel. What kernel are the server and client running? -Connie Sieh