[CentOS] Server locking up everyday around 3:30 AM - (INFO: task wget:13608 blocked for more than 120 seconds) need sleep, help.

PJ

pauljerome at gmail.com
Fri Mar 11 17:33:34 UTC 2011


This may or may not be CentOS related, but am out of ideas at this
point and wanted to bounce this off the list.

I'm running a CentOS 5.5 server, running the latest kernel 2.6.18-194.32.1.el5.

Almost everyday around 3:30 AM the server completely locks up and has
to be power cycled before it will come back online.
(this means someone hat to wake up and reboot the server, oh how I
love being an internet janitor! :)

Smells like a hardware issue to me too, but I went through all of the
dell diagnostics, updated the firmware, everything checks out as being
okay, RAID, disks, RAM, etc... Spent an hour on the phone with a Dell
tech. No hardware issues, at least that we were able to find.

There are no cron jobs that run at 3:30, no backups, the server has a
load of 0, nothing is scheduled around that time...

The only crontab entry at all is "*/5 * * * * wget -q
www.websitedomain.com/cron.php >/dev/null 2>&1"
They are running Magento for commerce purposes and this runs every 5 minutes.

Why does the server only lockup around 3:30 AM? Because it's knows I
am fast asleep?

I was able to pull this from /var/log/messages, this happens just
seconds before locking up completely...

Mar  8 03:33:18 web1 kernel: INFO: task wget:13608 blocked for more
than 120 seconds.
Mar  8 03:33:19 web1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  8 03:33:19 web1 kernel: wget          D ffff810001004420     0
13608  13607                     (NOTLB)
Mar  8 03:33:19 web1 kernel:  ffff81007bc7bc78 0000000000000086
ffff81007bc7bd88 ffff81000100d3f8
Mar  8 03:33:19 web1 kernel:  ffff81007bc7bbf0 0000000000000007
ffff8100849db0c0 ffffffff80308b60
Mar  8 03:33:19 web1 kernel:  00013a2964cdf439 0000000000003237
ffff8100849db2a8 0000000064c82eae
Mar  8 03:33:19 web1 kernel: Call Trace:
Mar  8 03:33:20 web1 kernel:  [<ffffffff80063c6f>]
__mutex_lock_slowpath+0x60/0x9b
Mar  8 03:33:20 web1 kernel:  [<ffffffff80063cb9>] .text.lock.mutex+0xf/0x14
Mar  8 03:33:20 web1 kernel:  [<ffffffff8000cf82>] do_lookup+0x90/0x1e6
Mar  8 03:33:20 web1 kernel:  [<ffffffff8000a29c>] __link_path_walk+0xa01/0xf5b
Mar  8 03:33:20 web1 kernel:  [<ffffffff8000ea4b>] link_path_walk+0x42/0xb2
Mar  8 03:33:20 web1 kernel:  [<ffffffff8000cd72>] do_path_lookup+0x275/0x2f1
Mar  8 03:33:23 web1 kernel:  [<ffffffff80012851>] getname+0x15b/0x1c2
Mar  8 03:33:23 web1 kernel:  [<ffffffff800239d1>] __user_walk_fd+0x37/0x4c
Mar  8 03:33:23 web1 kernel:  [<ffffffff80028905>] vfs_stat_fd+0x1b/0x4a
Mar  8 03:33:23 web1 kernel:  [<ffffffff80023703>] sys_newstat+0x19/0x31
Mar  8 03:33:23 web1 kernel:  [<ffffffff8005d116>] system_call+0x7e/0x83

If anyone has some advice on where to go from here it would be greatly
appreciated.

Thanks in advance.

--
PJF



More information about the CentOS mailing list