[CentOS] Server locking up everyday around 3:30 AM - (INFO: task wget:13608 blocked for more than 120 seconds) need sleep, help.

Boris Epstein borepstein at gmail.com
Fri Mar 11 17:42:04 UTC 2011


On Fri, Mar 11, 2011 at 12:33 PM, PJ <pauljerome at gmail.com> wrote:
> This may or may not be CentOS related, but am out of ideas at this
> point and wanted to bounce this off the list.
>
> I'm running a CentOS 5.5 server, running the latest kernel 2.6.18-194.32.1.el5.
>
> Almost everyday around 3:30 AM the server completely locks up and has
> to be power cycled before it will come back online.
> (this means someone hat to wake up and reboot the server, oh how I
> love being an internet janitor! :)
>
> Smells like a hardware issue to me too, but I went through all of the
> dell diagnostics, updated the firmware, everything checks out as being
> okay, RAID, disks, RAM, etc... Spent an hour on the phone with a Dell
> tech. No hardware issues, at least that we were able to find.
>
> There are no cron jobs that run at 3:30, no backups, the server has a
> load of 0, nothing is scheduled around that time...
>
> The only crontab entry at all is "*/5 * * * * wget -q
> www.websitedomain.com/cron.php >/dev/null 2>&1"
> They are running Magento for commerce purposes and this runs every 5 minutes.
>
> Why does the server only lockup around 3:30 AM? Because it's knows I
> am fast asleep?
>
> I was able to pull this from /var/log/messages, this happens just
> seconds before locking up completely...
>
> Mar  8 03:33:18 web1 kernel: INFO: task wget:13608 blocked for more
> than 120 seconds.
> Mar  8 03:33:19 web1 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar  8 03:33:19 web1 kernel: wget          D ffff810001004420     0
> 13608  13607                     (NOTLB)
> Mar  8 03:33:19 web1 kernel:  ffff81007bc7bc78 0000000000000086
> ffff81007bc7bd88 ffff81000100d3f8
> Mar  8 03:33:19 web1 kernel:  ffff81007bc7bbf0 0000000000000007
> ffff8100849db0c0 ffffffff80308b60
> Mar  8 03:33:19 web1 kernel:  00013a2964cdf439 0000000000003237
> ffff8100849db2a8 0000000064c82eae
> Mar  8 03:33:19 web1 kernel: Call Trace:
> Mar  8 03:33:20 web1 kernel:  [<ffffffff80063c6f>]
> __mutex_lock_slowpath+0x60/0x9b
> Mar  8 03:33:20 web1 kernel:  [<ffffffff80063cb9>] .text.lock.mutex+0xf/0x14
> Mar  8 03:33:20 web1 kernel:  [<ffffffff8000cf82>] do_lookup+0x90/0x1e6
> Mar  8 03:33:20 web1 kernel:  [<ffffffff8000a29c>] __link_path_walk+0xa01/0xf5b
> Mar  8 03:33:20 web1 kernel:  [<ffffffff8000ea4b>] link_path_walk+0x42/0xb2
> Mar  8 03:33:20 web1 kernel:  [<ffffffff8000cd72>] do_path_lookup+0x275/0x2f1
> Mar  8 03:33:23 web1 kernel:  [<ffffffff80012851>] getname+0x15b/0x1c2
> Mar  8 03:33:23 web1 kernel:  [<ffffffff800239d1>] __user_walk_fd+0x37/0x4c
> Mar  8 03:33:23 web1 kernel:  [<ffffffff80028905>] vfs_stat_fd+0x1b/0x4a
> Mar  8 03:33:23 web1 kernel:  [<ffffffff80023703>] sys_newstat+0x19/0x31
> Mar  8 03:33:23 web1 kernel:  [<ffffffff8005d116>] system_call+0x7e/0x83
>
> If anyone has some advice on where to go from here it would be greatly
> appreciated.
>
> Thanks in advance.
>
> --
> PJF
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>

Have you tried disabling the cron job you think is at fault to see if
the lock up goes away? Also, have you checked all the users' crontabs?

Boris.



More information about the CentOS mailing list