[CentOS] Server locking up everyday around 3:30 AM

Fri Mar 11 18:20:09 UTC 2011
PJ <pauljerome at gmail.com>

On Fri, Mar 11, 2011 at 10:06 AM,  <m.roth at 5-cent.us> wrote:
> PJ wrote:
>> This may or may not be CentOS related, but am out of ideas at this point
> and wanted to bounce this off the list.
>>
>> I'm running a CentOS 5.5 server, running the latest kernel
>> 2.6.18-194.32.1.el5.
>>
>> Almost everyday around 3:30 AM the server completely locks up and has to
> be power cycled before it will come back online.
>> (this means someone hat to wake up and reboot the server, oh how I love
> being an internet janitor! :)
>
> Please log of the Internet. We are cleaning it. You may log back on later.
>
> <snip>
>> I was able to pull this from /var/log/messages, this happens just
> seconds before locking up completely...
>>
>> Mar  8 03:33:18 web1 kernel: INFO: task wget:13608 blocked for more than
> 120 seconds.
>> Mar  8 03:33:19 web1 kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Mar  8 03:33:19 web1 kernel: wget          D ffff810001004420     0
> 13608  13607                     (NOTLB)
>> Mar  8 03:33:19 web1 kernel:  ffff81007bc7bc78 0000000000000086
>> ffff81007bc7bd88 ffff81000100d3f8
>> Mar  8 03:33:19 web1 kernel:  ffff81007bc7bbf0 0000000000000007
>> ffff8100849db0c0 ffffffff80308b60
>> Mar  8 03:33:19 web1 kernel:  00013a2964cdf439 0000000000003237
>> ffff8100849db2a8 0000000064c82eae
>> Mar  8 03:33:19 web1 kernel: Call Trace:
>> Mar  8 03:33:20 web1 kernel:  [<ffffffff80063c6f>]
>> __mutex_lock_slowpath+0x60/0x9b
> <snip>
> Anyone else smell an OOM killer? But it's clearly whatever the wget's
> after that's killing the system.
>
>           mark
>
>
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>

What makes no sense to me is this runs every 5 minutes all day, but
only around 3:30 AM does it look up.

There is nothing in the log that suggests the kernel is having to kill
processes because it is out of resources.

No "httpd invoked oom-killer" etc... which I have seen before in other
situations.

http://bugs.centos.org/view.php?id=4515 sounds like what I have going
on, but not with kjournald of course...

Thanks,

--
PJ