I have a machine here that resets itself every one hour (without my intention, of course):
# cat /var/log/messages | grep "sith kernel: Linux version 2.6.18-128.1.16.el5" Jul 14 22:29:41 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 14 23:30:09 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 00:30:36 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 01:31:04 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 02:31:31 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 03:32:01 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 04:32:30 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 05:32:58 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 06:33:26 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 07:33:56 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 08:34:21 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 09:34:52 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 10:38:48 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 11:35:47 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 12:36:17 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009 Jul 15 13:36:46 sith kernel: Linux version 2.6.18-128.1.16.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Tue Jun 30 06:10:28 EDT 2009
The machine is supposed to be up 24/7 and after each reset the system reboots again, and works normally only to reset again one hour later. However, the only unusual thing I am able to recognize in the logs is winbind daemon whining about something I don't understand:
# tail -f /var/log/messages Jul 15 13:47:39 sith winbindd[3353]: [2009/07/15 13:47:39, 0] nsswitch/idmap.c:idmap_alloc_init(820) Jul 15 13:47:39 sith winbindd[3353]: ERROR: Initialization failed for alloc backend, deferred! Jul 15 13:47:39 sith smbd[3373]: [2009/07/15 13:47:39, 0] auth/auth_util.c:create_builtin_users(810) Jul 15 13:47:39 sith smbd[3373]: create_builtin_users: Failed to create Users Jul 15 13:47:39 sith winbindd[2927]: [2009/07/15 13:47:39, 0] nsswitch/winbindd_passdb.c:sid_to_name(126) Jul 15 13:47:39 sith winbindd[2927]: Possible deadlock: Trying to lookup SID S-1-22-1-99 with passdb backend Jul 15 13:47:39 sith winbindd[2927]: [2009/07/15 13:47:39, 0] nsswitch/winbindd_passdb.c:sid_to_name(126) Jul 15 13:47:39 sith winbindd[2927]: Possible deadlock: Trying to lookup SID S-1-1-0 with passdb backend Jul 15 13:47:39 sith winbindd[2927]: [2009/07/15 13:47:39, 0] nsswitch/winbindd_passdb.c:sid_to_name(126) Jul 15 13:47:39 sith winbindd[2927]: Possible deadlock: Trying to lookup SID S-1-5-2 with passdb backend
Nevertheless, samba server seems to be running ok, so I am not sure this is related to reboots, and there is nothing else suspicious in the logs AFAICT.
How do I troubleshoot these restarts? I suspected hardware failure (power supply, cooling fans, etc.) but somehow the restarts happen way too periodically, so I have second thoughts on software as well. CentOS 5.3, fully updated.
Btw, I don't have physical access to the machine until September, so diagnosing hardware is pretty limited atm. Only remote ssh available.
I would really appreciate any advice on this!
Best, :-) Marko