[CentOS] Server locking up everyday around 3:30 AM

Fri Mar 11 22:06:11 UTC 2011
PJ <pauljerome at gmail.com>

On Fri, Mar 11, 2011 at 11:05 AM, Steve Thompson <smt at vgersoft.com> wrote:
>
>> PJ wrote:
>>> Mar  8 03:33:18 web1 kernel: INFO: task wget:13608 blocked for more than
>> 120 seconds.
>
> Check the number of dirty pages:
>
>        grep Dirty /proc/meminfo
>
> relative to the dirty_ratio setting:
>
>        cat /proc/sys/vm/dirty_ratio
>
> to see if the system is going into synhronous flush mode around that time
> (especially if dirty_ratio is large and you have a lot of physical
> memory). This is what I usually see as the cause of the "blocked for more
> than" message. I've also found that it can be several minutes, and up to
> 20 minutes, before the system recovers (but recover it always does).
>
> -Steve
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>

Great replies from everyone, I really appreciate the feedback.

Interesting entries in /var/log/cron:

-snip-
(this runs 24/7 every 5 minutes as normal...)

Mar 11 02:20:01 web1 crond[12919]: (webuser) CMD (wget -q
www.domain.com/cron.php >/dev/null 2>&1)
Mar 11 02:25:01 web1 crond[12950]: (webuser) CMD (wget -q
www.domain.com/cron.php >/dev/null 2>&1)
Mar 11 02:30:01 web1 crond[12969]: (webuser) CMD (wget -q
www.domain.com/cron.php >/dev/null 2>&1)
Mar 11 02:35:01 web1 crond[12992]: (webuser) CMD (wget -q
www.domain.com/cron.php >/dev/null 2>&1)
Mar 11 02:40:01 web1 crond[13014]: (webuser) CMD (wget -q
www.domain.com/cron.php >/dev/null 2>&1)
Mar 11 02:45:01 web1 crond[13218]: (webuser) CMD (wget -q
www.domain.com/cron.php >/dev/null 2>&1)

-snip-
(fast forward to 3 AM, the same cron job starts getting delayed.... by
3:27 the server was non responsive. Never seen this before)

Mar 11 03:01:01 web1 crond[13613]: (root) CMD (run-parts /etc/cron.hourly)
Mar 11 03:07:20 web1 crond[13727]: (webuser) error: Job execution of
per-minute job scheduled for 03:05 delayed into subsequent minute
03:07. Skipping job run.
Mar 11 03:07:20 web1 crond[13727]: CRON (webuser) ERROR: cannot set
security context
Mar 11 03:13:00 web1 crond[13825]: (webuser) error: Job execution of
per-minute job scheduled for 03:10 delayed into subsequent minute
03:13. Skipping job run.
Mar 11 03:13:00 web1 crond[13825]: CRON (webuser) ERROR: cannot set
security context
Mar 11 03:19:29 web1 crond[13854]: (webuser) error: Job execution of
per-minute job scheduled for 03:15 delayed into subsequent minute
03:19. Skipping job run.
Mar 11 03:20:16 web1 crond[13890]: (webuser) CMD (wget -q
www.domain.com/cron.php >/dev/null 2>&1)
Mar 11 03:21:01 web1 crond[13854]: CRON (webuser) ERROR: cannot set
security context
Mar 11 03:27:41 web1 crond[13912]: (webuser) error: Job execution of
per-minute job scheduled for 03:25 delayed into subsequent minute
03:27. Skipping job run.
Mar 11 03:27:42 web1 crond[13912]: CRON (webuser) ERROR: cannot set
security context
Mar 11 03:32:05 web1 crond[13930]: (webuser) error: Job execution of
per-minute job scheduled for 03:30 delayed into subsequent minute
03:32. Skipping job run.
Mar 11 03:32:05 web1 crond[13930]: CRON (webuser) ERROR: cannot set
security context
Mar 11 03:36:23 web1 crond[13948]: (webuser) error: Job execution of
per-minute job scheduled for 03:35 delayed into subsequent minute
03:36. Skipping job run.
Mar 11 03:36:23 web1 crond[13948]: CRON (webuser) ERROR: cannot set
security context

(rebooted)
Mar 11 03:41:15 web1 crond[4776]: (CRON) STARTUP (V5.0)

-snip-

I don't think it is a coincidence I'm seeing "CRON (webuser) ERROR:
cannot set security context" around the same time the server stops
responding.

I'm not familiar with this message, anyone here seen it?

cron daily fires off at 4:02, after all this stuff...

nothing in cron.hourly..

Getting warmer I think, but still cant figure it out!

--
PJ