On Fri, Mar 11, 2011 at 11:05 AM, Steve Thompson <smt at vgersoft.com> wrote: > >> PJ wrote: >>> Mar 8 03:33:18 web1 kernel: INFO: task wget:13608 blocked for more than >> 120 seconds. > > Check the number of dirty pages: > > grep Dirty /proc/meminfo > > relative to the dirty_ratio setting: > > cat /proc/sys/vm/dirty_ratio > > to see if the system is going into synhronous flush mode around that time > (especially if dirty_ratio is large and you have a lot of physical > memory). This is what I usually see as the cause of the "blocked for more > than" message. I've also found that it can be several minutes, and up to > 20 minutes, before the system recovers (but recover it always does). > > -Steve > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > Great replies from everyone, I really appreciate the feedback. Interesting entries in /var/log/cron: -snip- (this runs 24/7 every 5 minutes as normal...) Mar 11 02:20:01 web1 crond[12919]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 02:25:01 web1 crond[12950]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 02:30:01 web1 crond[12969]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 02:35:01 web1 crond[12992]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 02:40:01 web1 crond[13014]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 02:45:01 web1 crond[13218]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) -snip- (fast forward to 3 AM, the same cron job starts getting delayed.... by 3:27 the server was non responsive. Never seen this before) Mar 11 03:01:01 web1 crond[13613]: (root) CMD (run-parts /etc/cron.hourly) Mar 11 03:07:20 web1 crond[13727]: (webuser) error: Job execution of per-minute job scheduled for 03:05 delayed into subsequent minute 03:07. Skipping job run. Mar 11 03:07:20 web1 crond[13727]: CRON (webuser) ERROR: cannot set security context Mar 11 03:13:00 web1 crond[13825]: (webuser) error: Job execution of per-minute job scheduled for 03:10 delayed into subsequent minute 03:13. Skipping job run. Mar 11 03:13:00 web1 crond[13825]: CRON (webuser) ERROR: cannot set security context Mar 11 03:19:29 web1 crond[13854]: (webuser) error: Job execution of per-minute job scheduled for 03:15 delayed into subsequent minute 03:19. Skipping job run. Mar 11 03:20:16 web1 crond[13890]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 03:21:01 web1 crond[13854]: CRON (webuser) ERROR: cannot set security context Mar 11 03:27:41 web1 crond[13912]: (webuser) error: Job execution of per-minute job scheduled for 03:25 delayed into subsequent minute 03:27. Skipping job run. Mar 11 03:27:42 web1 crond[13912]: CRON (webuser) ERROR: cannot set security context Mar 11 03:32:05 web1 crond[13930]: (webuser) error: Job execution of per-minute job scheduled for 03:30 delayed into subsequent minute 03:32. Skipping job run. Mar 11 03:32:05 web1 crond[13930]: CRON (webuser) ERROR: cannot set security context Mar 11 03:36:23 web1 crond[13948]: (webuser) error: Job execution of per-minute job scheduled for 03:35 delayed into subsequent minute 03:36. Skipping job run. Mar 11 03:36:23 web1 crond[13948]: CRON (webuser) ERROR: cannot set security context (rebooted) Mar 11 03:41:15 web1 crond[4776]: (CRON) STARTUP (V5.0) -snip- I don't think it is a coincidence I'm seeing "CRON (webuser) ERROR: cannot set security context" around the same time the server stops responding. I'm not familiar with this message, anyone here seen it? cron daily fires off at 4:02, after all this stuff... nothing in cron.hourly.. Getting warmer I think, but still cant figure it out! -- PJ