On Fri, Mar 11, 2011 at 11:05 AM, Steve Thompson smt@vgersoft.com wrote:
PJ wrote:
Mar 8 03:33:18 web1 kernel: INFO: task wget:13608 blocked for more than
120 seconds.
Check the number of dirty pages:
grep Dirty /proc/meminfo
relative to the dirty_ratio setting:
cat /proc/sys/vm/dirty_ratio
to see if the system is going into synhronous flush mode around that time (especially if dirty_ratio is large and you have a lot of physical memory). This is what I usually see as the cause of the "blocked for more than" message. I've also found that it can be several minutes, and up to 20 minutes, before the system recovers (but recover it always does).
-Steve _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Great replies from everyone, I really appreciate the feedback.
Interesting entries in /var/log/cron:
-snip- (this runs 24/7 every 5 minutes as normal...)
Mar 11 02:20:01 web1 crond[12919]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 02:25:01 web1 crond[12950]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 02:30:01 web1 crond[12969]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 02:35:01 web1 crond[12992]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 02:40:01 web1 crond[13014]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 02:45:01 web1 crond[13218]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1)
-snip- (fast forward to 3 AM, the same cron job starts getting delayed.... by 3:27 the server was non responsive. Never seen this before)
Mar 11 03:01:01 web1 crond[13613]: (root) CMD (run-parts /etc/cron.hourly) Mar 11 03:07:20 web1 crond[13727]: (webuser) error: Job execution of per-minute job scheduled for 03:05 delayed into subsequent minute 03:07. Skipping job run. Mar 11 03:07:20 web1 crond[13727]: CRON (webuser) ERROR: cannot set security context Mar 11 03:13:00 web1 crond[13825]: (webuser) error: Job execution of per-minute job scheduled for 03:10 delayed into subsequent minute 03:13. Skipping job run. Mar 11 03:13:00 web1 crond[13825]: CRON (webuser) ERROR: cannot set security context Mar 11 03:19:29 web1 crond[13854]: (webuser) error: Job execution of per-minute job scheduled for 03:15 delayed into subsequent minute 03:19. Skipping job run. Mar 11 03:20:16 web1 crond[13890]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1) Mar 11 03:21:01 web1 crond[13854]: CRON (webuser) ERROR: cannot set security context Mar 11 03:27:41 web1 crond[13912]: (webuser) error: Job execution of per-minute job scheduled for 03:25 delayed into subsequent minute 03:27. Skipping job run. Mar 11 03:27:42 web1 crond[13912]: CRON (webuser) ERROR: cannot set security context Mar 11 03:32:05 web1 crond[13930]: (webuser) error: Job execution of per-minute job scheduled for 03:30 delayed into subsequent minute 03:32. Skipping job run. Mar 11 03:32:05 web1 crond[13930]: CRON (webuser) ERROR: cannot set security context Mar 11 03:36:23 web1 crond[13948]: (webuser) error: Job execution of per-minute job scheduled for 03:35 delayed into subsequent minute 03:36. Skipping job run. Mar 11 03:36:23 web1 crond[13948]: CRON (webuser) ERROR: cannot set security context
(rebooted) Mar 11 03:41:15 web1 crond[4776]: (CRON) STARTUP (V5.0)
-snip-
I don't think it is a coincidence I'm seeing "CRON (webuser) ERROR: cannot set security context" around the same time the server stops responding.
I'm not familiar with this message, anyone here seen it?
cron daily fires off at 4:02, after all this stuff...
nothing in cron.hourly..
Getting warmer I think, but still cant figure it out!
-- PJ