On 03/11/2011 04:06 PM, PJ wrote:
<snip>
Interesting entries in /var/log/cron:
-snip- (this runs 24/7 every 5 minutes as normal...)
Mar 11 02:20:01 web1 crond[12919]: (webuser) CMD (wget -q www.domain.com/cron.php >/dev/null 2>&1)
<snip>
(fast forward to 3 AM, the same cron job starts getting delayed.... by 3:27 the server was non responsive. Never seen this before)
Mar 11 03:01:01 web1 crond[13613]: (root) CMD (run-parts /etc/cron.hourly) Mar 11 03:07:20 web1 crond[13727]: (webuser) error: Job execution of per-minute job scheduled for 03:05 delayed into subsequent minute 03:07. Skipping job run. Mar 11 03:07:20 web1 crond[13727]: CRON (webuser) ERROR: cannot set security context
<snip>
I don't think it is a coincidence I'm seeing "CRON (webuser) ERROR: cannot set security context" around the same time the server stops responding.
I'm not familiar with this message, anyone here seen it?
cron daily fires off at 4:02, after all this stuff...
nothing in cron.hourly..
Getting warmer I think, but still cant figure it out!
OK, did the webuser job run at 03:00:01 or does it start at 03:05:01?
The system likely thinks that cron job is still running from the last time it was initiated. (Be it at 03:00:01 or 02:55:01).
Is there anything that the php file called by the cron (via wget) is supposed to do at 03:00 that is different than other times?
You might try something like this:
*/5 0,1,2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 * * * <current_cron_command>
15,30,45 3 * * * <current_cron_comamnd>
(those commands should run the cron normally except starting at 03:00, where it should kick off at 3:15 instead of 03:05)
If it also fails to start at 03:15 then that would suggest that something is happening to the cron job the last time it is run to make it hang (or make the system think it is hung).