[CentOS] My server reboots every hour! Help please!

Wed Jul 15 12:23:06 UTC 2009
Per Qvindesland <per at norhex.com>

Hi

I am really not sure if this is related but I had a case sometime ago
with a server that rebooted really reguarly and the problem was the
nic that was faulty causing a kernel panic, but in these cases when
there is really not much to go on from the logs 

For your possible deadlocks you could add into smb.conf strict
locking = no

Regards
Per Qvindesland
E-mail: per at norhex.com [1]
http://www.linkedin.com/in/perqvindesland [2]
--- Original message follows ---
SUBJECT: [CentOS] My server reboots every hour! Help please!
FROM:  Marko Vojinovic 
TO: "CentOS mailing list" 
DATE: 15-07-2009 14:16

I have a machine here that resets itself every one hour (without my
intention, of course):

# cat /var/log/messages | grep "sith kernel: Linux version
2.6.18-128.1.16.el5"
Jul 14 22:29:41 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 14 23:30:09 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 00:30:36 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 01:31:04 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 02:31:31 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 03:32:01 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 04:32:30 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 05:32:58 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 06:33:26 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 07:33:56 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 08:34:21 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 09:34:52 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 10:38:48 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 11:35:47 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 12:36:17 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
Jul 15 13:36:46 sith kernel: Linux version 2.6.18-128.1.16.el5
(mockbuild at builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009

The machine is supposed to be up 24/7 and after each reset the system
reboots again, and works normally only to reset again one hour later.
However, the only unusual thing I am able to recognize in the logs is
winbind daemon whining about something I don't understand:

# tail -f /var/log/messages
Jul 15 13:47:39 sith winbindd[3353]: [2009/07/15 13:47:39, 0]
nsswitch/idmap.c:idmap_alloc_init(820)
Jul 15 13:47:39 sith winbindd[3353]: ERROR: Initialization failed
for alloc backend, deferred!
Jul 15 13:47:39 sith smbd[3373]: [2009/07/15 13:47:39, 0]
auth/auth_util.c:create_builtin_users(810)
Jul 15 13:47:39 sith smbd[3373]: create_builtin_users: Failed to
create Users
Jul 15 13:47:39 sith winbindd[2927]: [2009/07/15 13:47:39, 0]
nsswitch/winbindd_passdb.c:sid_to_name(126)
Jul 15 13:47:39 sith winbindd[2927]: Possible deadlock: Trying to
lookup SID S-1-22-1-99 with passdb backend
Jul 15 13:47:39 sith winbindd[2927]: [2009/07/15 13:47:39, 0]
nsswitch/winbindd_passdb.c:sid_to_name(126)
Jul 15 13:47:39 sith winbindd[2927]: Possible deadlock: Trying to
lookup SID S-1-1-0 with passdb backend
Jul 15 13:47:39 sith winbindd[2927]: [2009/07/15 13:47:39, 0]
nsswitch/winbindd_passdb.c:sid_to_name(126)
Jul 15 13:47:39 sith winbindd[2927]: Possible deadlock: Trying to
lookup SID S-1-5-2 with passdb backend

Nevertheless, samba server seems to be running ok, so I am not sure
this is related to reboots, and there is nothing else suspicious in
the logs AFAICT.

How do I troubleshoot these restarts? I suspected hardware failure
(power supply, cooling fans, etc.) but somehow the restarts happen
way
too periodically, so I have second thoughts on software as well.
CentOS 5.3, fully updated.

Btw, I don't have physical access to the machine until September, so
diagnosing hardware is pretty limited atm. Only remote ssh available.

I would really appreciate any advice on this!

Best, :-)
Marko
_______________________________________________
CentOS mailing list
CentOS at centos.org
http://lists.centos.org/mailman/listinfo/centos

Links:
------
[1] http://webmail.norhex.com/#
[2] http://www.linkedin.com/in/perqvindesland
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos/attachments/20090715/ab8ed7ea/attachment-0005.html>