Well, we're figure the board's got problems, but I installed OpenIPMI a couple weeks ago, and fired it up as a service, then added a cron job. That all ran well until last evening; we came in to find 5 zillion emails complaining Unable to open SDR for reading
I worked my way through logs, and googling, and then trying to run ipmitool by hand, and it complains there's no such device as /dev/ipmi[three versions].
More googling, and down to modprobe, then insmod, of ipmi_si. Then trying to give it parameters, and it continually comes back (after 5 min of trying) with "No such device".
One thing I note: after I found the docs for IPMI, I found the list of parms http://www.mjmwired.net/kernel/Documentation/IPMI.txt, and gave it, among other things, the slave_attrs. Then I look at dmesg and /var/log/messages... and it seems to be utterly ignoring that parm. That is, I say (and get) $ insmod /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/char/ipmi/ipmi_si.ko ports=0xca2 slave_addrs=0x10 insmod: error inserting '/lib/modules/2.6.18-164.11.1.el5/kernel/drivers/char/ipmi/ipmi_si.ko': -1 No such device
and in the log Mar 10 11:24:36 south kernel: IPMI System Interface driver. Mar 10 11:24:36 south kernel: ipmi_si: Trying hardcoded-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0 Mar 10 11:26:28 south kernel: ipmi_si: There appears to be no BMC at this location Mar 10 11:26:28 south kernel: ipmi_si: Trying SMBIOS-specified kcs state machine at i/o address 0xca2, slave address 0x20, irq 0 Mar 10 11:28:20 south kernel: ipmi_si: There appears to be no BMC at this location Mar 10 11:30:12 south kernel: ipmi_si: Unable to find any System Interface(s)
Notice that it's looking at slave_addrs of 0x0 and 0x20, *not* 0x10.
Any clues?
mark
M.roth@5-cent.us wrote on Wed, 10 Mar 2010 13:51:49 -0500:
Any clues?
Make sure the BMC didn't die. (Yes, this happens.)
Kai
On Wednesday 10 March 2010, m.roth@5-cent.us wrote: ...
$ insmod /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/char/ipmi/ipmi_si.ko ports=0xca2 slave_addrs=0x10 insmod: error inserting '/lib/modules/2.6.18-164.11.1.el5/kernel/drivers/char/ipmi/ipmi_si.ko': -1 No such device
and in the log Mar 10 11:24:36 south kernel: IPMI System Interface driver. Mar 10 11:24:36 south kernel: ipmi_si: Trying hardcoded-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0 Mar 10 11:26:28 south kernel: ipmi_si: There appears to be no BMC at this location Mar 10 11:26:28 south kernel: ipmi_si: Trying SMBIOS-specified kcs state machine at i/o address 0xca2, slave address 0x20, irq 0 Mar 10 11:28:20 south kernel: ipmi_si: There appears to be no BMC at this location Mar 10 11:30:12 south kernel: ipmi_si: Unable to find any System Interface(s)
Seems to me that the IPMI driver can't find the IPMI hardware. What kind of server are you trying this on? Is it known to work with the IPMI-driver in vanilla CentOS-5.4?
/Peter
Notice that it's looking at slave_addrs of 0x0 and 0x20, *not* 0x10.
Any clues?
mark
Peter Kjellstrom wrote:
On Wednesday 10 March 2010, m.roth@5-cent.us wrote: ...
$ insmod /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/char/ipmi/ipmi_si.ko ports=0xca2 slave_addrs=0x10 insmod: error inserting '/lib/modules/2.6.18-164.11.1.el5/kernel/drivers/char/ipmi/ipmi_si.ko': -1 No such device
and in the log Mar 10 11:24:36 south kernel: IPMI System Interface driver. Mar 10 11:24:36 south kernel: ipmi_si: Trying hardcoded-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0 Mar 10 11:26:28 south kernel: ipmi_si: There appears to be no BMC at this location Mar 10 11:26:28 south kernel: ipmi_si: Trying SMBIOS-specified kcs state machine at i/o address 0xca2, slave address 0x20, irq 0 Mar 10 11:28:20 south kernel: ipmi_si: There appears to be no BMC at this location Mar 10 11:30:12 south kernel: ipmi_si: Unable to find any System Interface(s)
Seems to me that the IPMI driver can't find the IPMI hardware. What kind of server are you trying this on? Is it known to work with the IPMI-driver in vanilla CentOS-5.4?
<snip> You seem to have missed the beginning of my original post, where I said that it had been running fine for 10 days, then *stopped* working. The server's still up, though I haven't been into the data center to see if the idiot red led "fault light" is blinking on (and that has no blink code, just "there's a problem" is what the docs say).
mark
mark
On Thursday 11 March 2010, mark wrote:
Peter Kjellstrom wrote:
...
Seems to me that the IPMI driver can't find the IPMI hardware. What kind of server are you trying this on? Is it known to work with the IPMI-driver in vanilla CentOS-5.4?
<snip> You seem to have missed the beginning of my original post, where I said that it had been running fine for 10 days, then *stopped* working.
Ooops, indeed, sorry for that :-)
That leaves, at least, two possibilities 1) the BMC is flaky (power cycle machine or maybe even replace the BMC) 2) kernel driver messed up.
If it's the driver then, assuming you havn't already, try to unload the ipmi stuff. Essentially "lsmod | grep ipmi" and the rmmod those. A "service ipmi restart" should probably automate this for you.
Since you're focusing on the "in system" approach I assume you lack an ethernet connection to the BMC? ...if not try to reach it that way.
/Peter
The server's still up, though I haven't been into the data center to see if the idiot red led "fault light" is blinking on (and that has no blink code, just "there's a problem" is what the docs say).
mark