Good morning!
On a CentOS 6.4 / 64 bit server I have installed the watchdog 5.5 package.
The rpm -qi watchdog states:
The watchdog program can be used as a powerful software watchdog daemon or may be alternately used with a hardware watchdog device such as the IPMI hardware watchdog driver interface to a resident Baseboard Management Controller (BMC). ... This configuration file is also used to set the watchdog to be used as a hardware watchdog instead of its default software watchdog operation.
In the dmesg output of my server ( full text at http://pastebin.com/GbF7dRt7 ) I see such a device:
NMI watchdog enabled, takes one hw-pmu counter. ... ipmi message handler version 39.2 IPMI System Interface driver. ipmi_si: Adding default-specified kcs state machine ipmi_si: Trying default-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0 ipmi_si: Interface detection failed ipmi_si: Adding default-specified smic state machine ipmi_si: Trying default-specified smic state machine at i/o address 0xca9, slave address 0x0, irq 0 ipmi_si: Interface detection failed ipmi_si: Adding default-specified bt state machine ipmi_si: Trying default-specified bt state machine at i/o address 0xe4, slave address 0x0, irq 0 ipmi_si: Interface detection failed ipmi_si: Unable to find any System Interface(s) ... iTCO_vendor_support: vendor-support=0 iTCO_wdt: Intel TCO WatchDog Timer Driver v1.07rh iTCO_wdt: Found a Lynx Point TCO device (Version=2, TCOBASE=0x1860) iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
My question is: what should I put into /etc/watchdog.conf to enable the hardware watchdog instead of the default software watchdog mode?
I've also asked this question at http://serverfault.com/questions/539816/how-to-use-watchdog-daemon-with-hard...
Thank you Alex
On 09/18/2013 03:57 AM, Alexander Farber wrote:
Good morning!
On a CentOS 6.4 / 64 bit server I have installed the watchdog 5.5 package.
The rpm -qi watchdog states:
The watchdog program can be used as a powerful software watchdog daemon or may be alternately used with a hardware watchdog device such as the IPMI hardware watchdog driver interface to a resident Baseboard Management Controller (BMC). ... This configuration file is also used to set the watchdog to be used as a hardware watchdog instead of its default software watchdog operation.
In the dmesg output of my server ( full text at http://pastebin.com/GbF7dRt7 ) I see such a device:
NMI watchdog enabled, takes one hw-pmu counter. ... ipmi message handler version 39.2 IPMI System Interface driver. ipmi_si: Adding default-specified kcs state machine ipmi_si: Trying default-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0 ipmi_si: Interface detection failed ipmi_si: Adding default-specified smic state machine ipmi_si: Trying default-specified smic state machine at i/o address 0xca9, slave address 0x0, irq 0 ipmi_si: Interface detection failed ipmi_si: Adding default-specified bt state machine ipmi_si: Trying default-specified bt state machine at i/o address 0xe4, slave address 0x0, irq 0 ipmi_si: Interface detection failed ipmi_si: Unable to find any System Interface(s) ... iTCO_vendor_support: vendor-support=0 iTCO_wdt: Intel TCO WatchDog Timer Driver v1.07rh iTCO_wdt: Found a Lynx Point TCO device (Version=2, TCOBASE=0x1860) iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
My question is: what should I put into /etc/watchdog.conf to enable the hardware watchdog instead of the default software watchdog mode?
I've also asked this question at http://serverfault.com/questions/539816/how-to-use-watchdog-daemon-with-hard...
Thank you Alex
Hi Alex,
Look to see if you have a /dev/watchdog device. If you do basically all you need to do is then service watchdog start. To test if it is working get the pid of the watchdog process and do a kill -9. If your box reboots then the hardware watchdog works.
There are some additional features you can have the watchdog program check, see the man page.
Hope this helps, Steve
Hello Steve,
yes, I have that device:
# ll /dev/watchdog crw-rw---- 1 root root 10, 130 Sep 17 23:21 /dev/watchdog
# ps uawwx|grep w[a]tchdog root 6 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/0] root 10 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/1] root 14 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/2] root 18 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/3] root 22 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/4] root 26 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/5] root 30 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/6] root 34 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/7] root 12175 0.0 0.0 6236 2140 ? SLs 11:11 0:00 /usr/sbin/watchdog -v
# grep -v ^# /etc/watchdog.conf ping = 144.76.XXX.XXX admin = root logtick = 360 realtime = yes priority = 1
So you think killing with -9 will indicate if I have hardware watchdog or just software?
Regards Alex
On 09/18/2013 07:20 AM, Alexander Farber wrote:
Hello Steve,
yes, I have that device:
# ll /dev/watchdog crw-rw---- 1 root root 10, 130 Sep 17 23:21 /dev/watchdog
# ps uawwx|grep w[a]tchdog root 6 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/0] root 10 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/1] root 14 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/2] root 18 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/3] root 22 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/4] root 26 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/5] root 30 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/6] root 34 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/7] root 12175 0.0 0.0 6236 2140 ? SLs 11:11 0:00 /usr/sbin/watchdog -v
# grep -v ^# /etc/watchdog.conf ping = 144.76.XXX.XXX admin = root logtick = 360 realtime = yes priority = 1
So you think killing with -9 will indicate if I have hardware watchdog or just software?
Regards Alex
the root 6 0.0 0.0 0 0 ? S Sep17 0:00 [watchdog/0] I believe are related to the cpus.
When you service watchdog start you will see a process like below. That is what you want to kill -9
2094 ? SLs 0:13 /usr/sbin/watchdog
That will preventing it from telling the kernel to reset the watchdog timer which will expire and should reboot you system.
If you don't use -9 the watchdog process will gracefully stop and tell the kernel to turn of the watchdog timer so it won't expire causing the reboot.
Anyway that it how it works on my system.
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos