I have a similar problem since updating to 5.3. We noticed that our nightly backups are not getting done as system reboots in the middle of it. There are no error messages in the log and messages on the console are long gone before I get an access to it.
I noticed this in the last log:
nenad pts/1 n.local Wed Jul 15 10:08 - 15:56 (1+05:47) reboot system boot 2.6.18-92.1.22.e Wed Jul 15 10:08 (1+22:46) nenad pts/4 n.local Wed Jul 15 09:33 - down (00:33) root pts/4 screamer.local Wed Jul 15 09:30 - 09:32 (00:02) nenad pts/4 screamer.local Wed Jul 15 09:23 - 09:24 (00:00) gary pts/4 screamer.local Wed Jul 15 00:58 - 01:46 (00:48) nenad pts/3 10.10.11.101 Wed Jul 15 00:38 - down (09:28) nenad pts/2 10.10.11.101 Wed Jul 15 00:30 - down (09:36) reboot system boot 2.6.18-128.1.10. Tue Jul 14 23:47 (10:19) nenad pts/1 n.local Mon Jul 13 14:08 - crash (1+09:39) reboot system boot 2.6.18-128.1.10. Mon Jul 13 14:07 (1+19:59) nenad pts/2 n.local Mon Jul 13 08:48 - crash (05:19) reboot system boot 2.6.18-128.1.16. Mon Jul 13 01:07 (2+08:59) reboot system boot 2.6.18-128.1.16. Sun Jul 12 00:40 (3+09:26) gary pts/2 screamer.local Sat Jul 11 07:31 - crash (17:09) reboot system boot 2.6.18-128.1.16. Sat Jul 11 04:10 (4+05:56) reboot system boot 2.6.18-128.1.16. Fri Jul 10 04:10 (5+05:56) reboot system boot 2.6.18-128.1.16. Thu Jul 9 00:12 (6+09:53) nenad pts/2 n.local Mon Jul 6 15:28 - crash (2+08:44) reboot system boot 2.6.18-128.1.16. Mon Jul 6 15:23 (8+18:43)
On July 13 I switched to the older version of the kernel but still crashed that night. I am now back to 2.6.18-92.1.22 kernel but didn't have a chance to run backups as our admin system is being upgraded.
Note that our remote backups are done over the network with ssh logins. All our logins are ssh, and crashes that I see on some users are probably related to ssh or networking as I know that nothing special was done at that time. My guess that it is ssh/network related.
This is I dual processor Xeon machine (i686) running Xen kernel. I noticed on some other lists that people are complaining about crashes caused by Intel e1000 network device on 5.3, something that my system has.
Nenad
From: Michael Calizo Sent: Thursday, July 16, 2009 8:02 PM To: CentOS mailing list Subject: Re: [CentOS] My server reboots every hour! Help please!
Base on last output, I would start to look on the process that was invoke by ljubica and vmarko, you might find something from there.
Anyways, is your server running any DB process? You might also look at the server history on when this problem start to happened and investigate any updates that you or the others have done prior to the problem. SAR command can help.
Mike --
On Fri, Jul 17, 2009 at 12:45 AM, Ross Walker rswwalker@gmail.com wrote:
On Thu, Jul 16, 2009 at 10:27 AM, Marko Vojinovicvvmarko@gmail.com wrote:
On Thu, Jul 16, 2009 at 11:06 AM, Michael Calizomike.calizo@gmail.com wrote:
can you post the output of last command?
Maybe we can find something like the account currently login when server reboots.
Here goes (note that it is sorted in most-recent-first fashion):
# last -R | less vmarko pts/1 Thu Jul 16 16:09 still logged in vmarko pts/1 Thu Jul 16 16:05 - 16:07 (00:02) vmarko pts/1 Thu Jul 16 11:37 - 11:37 (00:00) vmarko pts/1 Thu Jul 16 02:48 - 02:59 (00:10) reboot system boot Wed Jul 15 18:16 (21:59) reboot system boot Wed Jul 15 15:37 (00:03) vmarko pts/1 Wed Jul 15 15:34 - 15:34 (00:00) vmarko pts/1 Wed Jul 15 14:42 - 15:16 (00:34) reboot system boot Wed Jul 15 14:37 (01:04) vmarko pts/1 Wed Jul 15 13:38 - crash (00:58) reboot system boot Wed Jul 15 13:36 (02:04) reboot system boot Wed Jul 15 12:36 (03:05) reboot system boot Wed Jul 15 11:35 (04:05) reboot system boot Wed Jul 15 10:38 (05:02) reboot system boot Wed Jul 15 09:34 (06:06) reboot system boot Wed Jul 15 08:34 (07:07) reboot system boot Wed Jul 15 07:33 (08:07) reboot system boot Wed Jul 15 06:33 (09:08) reboot system boot Wed Jul 15 05:32 (10:08) reboot system boot Wed Jul 15 04:32 (11:09) reboot system boot Wed Jul 15 03:31 (12:09) reboot system boot Wed Jul 15 02:31 (13:10) reboot system boot Wed Jul 15 01:30 (14:10) reboot system boot Wed Jul 15 00:30 (15:10) reboot system boot Tue Jul 14 23:30 (16:11) reboot system boot Tue Jul 14 22:29 (17:11) reboot system boot Tue Jul 14 21:29 (18:12) reboot system boot Tue Jul 14 20:28 (19:12) reboot system boot Tue Jul 14 19:28 (20:13) reboot system boot Tue Jul 14 18:27 (21:13) reboot system boot Tue Jul 14 17:27 (22:14) reboot system boot Tue Jul 14 16:26 (23:14) vmarko pts/1 Tue Jul 14 15:39 - 15:42 (00:03) reboot system boot Tue Jul 14 15:26 (1+00:15) vmarko pts/1 Tue Jul 14 15:11 - crash (00:14) ljubica pts/1 Tue Jul 14 14:26 - 15:11 (00:44) ljubica :0 Tue Jul 14 14:26 - 14:54 (00:27) ljubica :0 Tue Jul 14 14:26 - 14:26 (00:00) reboot system boot Tue Jul 14 14:25 (1+01:15) ljubica pts/2 Tue Jul 14 13:27 - 13:27 (00:00) ljubica pts/1 Tue Jul 14 13:27 - crash (00:58) ljubica :0 Tue Jul 14 13:27 - crash (00:58) ljubica :0 Tue Jul 14 13:27 - 13:27 (00:00) reboot system boot Tue Jul 14 13:25 (1+02:16) ljubica pts/1 Tue Jul 14 12:48 - crash (00:36) ljubica :0 Tue Jul 14 12:48 - crash (00:36) ljubica :0 Tue Jul 14 12:48 - 12:48 (00:00) reboot system boot Tue Jul 14 12:41 (1+02:59) ljubica pts/1 Tue Jul 14 11:45 - crash (00:55)
From the last log it looks like user "ljubica" did something that was
causing his session to crash, then he did something to cause the server to reboot every hour. I would ask him/her what was done, it may be operator error on his/her part.
-Ross
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos