[CentOS] My server reboots every hour! Help please!

Fri Jul 17 16:21:37 UTC 2009
Nenad Vukicevic <nenad at intrepid.com>

I have a similar problem since updating to 5.3. We noticed that our nightly 
backups are not getting
done as system reboots in the middle of it. There are no error messages in 
the log and messages on
the console are long gone before I get an access to it.

I noticed this in the last log:

nenad    pts/1        n.local          Wed Jul 15 10:08 - 15:56 (1+05:47)
reboot   system boot  2.6.18-92.1.22.e Wed Jul 15 10:08         (1+22:46)
nenad    pts/4        n.local          Wed Jul 15 09:33 - down   (00:33)
root     pts/4        screamer.local   Wed Jul 15 09:30 - 09:32  (00:02)
nenad    pts/4        screamer.local   Wed Jul 15 09:23 - 09:24  (00:00)
gary     pts/4        screamer.local   Wed Jul 15 00:58 - 01:46  (00:48)
nenad    pts/3        10.10.11.101     Wed Jul 15 00:38 - down   (09:28)
nenad    pts/2        10.10.11.101     Wed Jul 15 00:30 - down   (09:36)
reboot   system boot  2.6.18-128.1.10. Tue Jul 14 23:47          (10:19)
nenad    pts/1        n.local          Mon Jul 13 14:08 - crash (1+09:39)
reboot   system boot  2.6.18-128.1.10. Mon Jul 13 14:07         (1+19:59)
nenad    pts/2        n.local          Mon Jul 13 08:48 - crash  (05:19)
reboot   system boot  2.6.18-128.1.16. Mon Jul 13 01:07         (2+08:59)
reboot   system boot  2.6.18-128.1.16. Sun Jul 12 00:40         (3+09:26)
gary     pts/2        screamer.local   Sat Jul 11 07:31 - crash  (17:09)
reboot   system boot  2.6.18-128.1.16. Sat Jul 11 04:10         (4+05:56)
reboot   system boot  2.6.18-128.1.16. Fri Jul 10 04:10         (5+05:56)
reboot   system boot  2.6.18-128.1.16. Thu Jul  9 00:12         (6+09:53)
nenad    pts/2        n.local          Mon Jul  6 15:28 - crash (2+08:44)
reboot   system boot  2.6.18-128.1.16. Mon Jul  6 15:23         (8+18:43)

On July 13 I switched to the older version of the kernel but still crashed 
that night. I am now
back to  2.6.18-92.1.22 kernel but didn't have a chance to run backups as 
our
admin system is being upgraded.

Note that our remote backups are done over the network with ssh logins. All 
our
logins are ssh, and crashes that I see on some users are probably related to 
ssh or networking
as I know that nothing special was done at that time. My guess that it is 
ssh/network related.

This is I dual processor Xeon machine (i686) running Xen kernel. I noticed 
on some other lists
that people are complaining about crashes caused by Intel e1000 network 
device on 5.3, something
that my system has.

Nenad


From: Michael Calizo
Sent: Thursday, July 16, 2009 8:02 PM
To: CentOS mailing list
Subject: Re: [CentOS] My server reboots every hour! Help please!


Base on last output, I would start to look on the process that was invoke by 
ljubica and vmarko, you might find something from there.


Anyways, is your server running any DB process? You might also look at the 
server history on when this problem start to happened and investigate any 
updates that you or the others have done prior to the problem. SAR command 
can help.


Mike -- 


On Fri, Jul 17, 2009 at 12:45 AM, Ross Walker <rswwalker at gmail.com> wrote:

On Thu, Jul 16, 2009 at 10:27 AM, Marko Vojinovic<vvmarko at gmail.com> wrote:
> On Thu, Jul 16, 2009 at 11:06 AM, Michael Calizo<mike.calizo at gmail.com> 
> wrote:
>> can you post the output of last command?
>>
>> Maybe we can find something like the account currently login when server
>> reboots.
>
> Here goes (note that it is sorted in most-recent-first fashion):
>
> # last -R | less
> vmarko   pts/1        Thu Jul 16 16:09   still logged in
> vmarko   pts/1        Thu Jul 16 16:05 - 16:07  (00:02)
> vmarko   pts/1        Thu Jul 16 11:37 - 11:37  (00:00)
> vmarko   pts/1        Thu Jul 16 02:48 - 02:59  (00:10)
> reboot   system boot  Wed Jul 15 18:16          (21:59)
> reboot   system boot  Wed Jul 15 15:37          (00:03)
> vmarko   pts/1        Wed Jul 15 15:34 - 15:34  (00:00)
> vmarko   pts/1        Wed Jul 15 14:42 - 15:16  (00:34)
> reboot   system boot  Wed Jul 15 14:37          (01:04)
> vmarko   pts/1        Wed Jul 15 13:38 - crash  (00:58)
> reboot   system boot  Wed Jul 15 13:36          (02:04)
> reboot   system boot  Wed Jul 15 12:36          (03:05)
> reboot   system boot  Wed Jul 15 11:35          (04:05)
> reboot   system boot  Wed Jul 15 10:38          (05:02)
> reboot   system boot  Wed Jul 15 09:34          (06:06)
> reboot   system boot  Wed Jul 15 08:34          (07:07)
> reboot   system boot  Wed Jul 15 07:33          (08:07)
> reboot   system boot  Wed Jul 15 06:33          (09:08)
> reboot   system boot  Wed Jul 15 05:32          (10:08)
> reboot   system boot  Wed Jul 15 04:32          (11:09)
> reboot   system boot  Wed Jul 15 03:31          (12:09)
> reboot   system boot  Wed Jul 15 02:31          (13:10)
> reboot   system boot  Wed Jul 15 01:30          (14:10)
> reboot   system boot  Wed Jul 15 00:30          (15:10)
> reboot   system boot  Tue Jul 14 23:30          (16:11)
> reboot   system boot  Tue Jul 14 22:29          (17:11)
> reboot   system boot  Tue Jul 14 21:29          (18:12)
> reboot   system boot  Tue Jul 14 20:28          (19:12)
> reboot   system boot  Tue Jul 14 19:28          (20:13)
> reboot   system boot  Tue Jul 14 18:27          (21:13)
> reboot   system boot  Tue Jul 14 17:27          (22:14)
> reboot   system boot  Tue Jul 14 16:26          (23:14)
> vmarko   pts/1        Tue Jul 14 15:39 - 15:42  (00:03)
> reboot   system boot  Tue Jul 14 15:26         (1+00:15)
> vmarko   pts/1        Tue Jul 14 15:11 - crash  (00:14)
> ljubica  pts/1        Tue Jul 14 14:26 - 15:11  (00:44)
> ljubica  :0           Tue Jul 14 14:26 - 14:54  (00:27)
> ljubica  :0           Tue Jul 14 14:26 - 14:26  (00:00)
> reboot   system boot  Tue Jul 14 14:25         (1+01:15)
> ljubica  pts/2        Tue Jul 14 13:27 - 13:27  (00:00)
> ljubica  pts/1        Tue Jul 14 13:27 - crash  (00:58)
> ljubica  :0           Tue Jul 14 13:27 - crash  (00:58)
> ljubica  :0           Tue Jul 14 13:27 - 13:27  (00:00)
> reboot   system boot  Tue Jul 14 13:25         (1+02:16)
> ljubica  pts/1        Tue Jul 14 12:48 - crash  (00:36)
> ljubica  :0           Tue Jul 14 12:48 - crash  (00:36)
> ljubica  :0           Tue Jul 14 12:48 - 12:48  (00:00)
> reboot   system boot  Tue Jul 14 12:41         (1+02:59)
> ljubica  pts/1        Tue Jul 14 11:45 - crash  (00:55)


>From the last log it looks like user "ljubica" did something that was
causing his session to crash, then he did something to cause the
server to reboot every hour. I would ask him/her what was done, it may
be operator error on his/her part.

-Ross

_______________________________________________
CentOS mailing list
CentOS at centos.org
http://lists.centos.org/mailman/listinfo/centos




-- 
Mike Calizo
Registered Linux User # 365113

_________________________________________________
Even the longest journey has to start with a small first-step





_______________________________________________
CentOS mailing list
CentOS at centos.org
http://lists.centos.org/mailman/listinfo/centos