[CentOS] ipmi regression in 4.5?

Fri Jun 22 11:26:19 UTC 2007
James Pearson <james-p at moving-picture.com>

Gavin Carr wrote:
> On Thu, Jun 21, 2007 at 10:13:56AM +0100, James Pearson wrote:
> 
>>Gavin Carr wrote:
>>
>>>I've been monitoring CPU temperature on a few Dell SC1435s running CentOS4
>>>via OpenIPMI and 'ipmitool sdr'. It's been working very nicely, but the
>>>upgrade to 4.5 not so long ago seems to have broken something:
>>>
>>> # ipmitool sdr type Temperature
>>> Temp             | 01h | ns  |  3.1 | Disabled
>>> Planar Temp      | 04h | ok  |  7.1 | 30 degrees C
>>> Temp Interface   | 53h | ns  |  7.1 | Disabled
>>>
>>>The disabled sensors above used to work fine, and there have been no config
>>>changes or bios upgrades or anything. All machines affected post 4.5.
>>
>>I had a similar problem with Dell boxes when I went from ipmitool v1.8.8 
>>to v1.8.9 - see the thread starting at:
>>
>><http://www.mail-archive.com/ipmitool-devel@lists.sourceforge.net/msg00468.html>
>>
>>It looks like the patch for ipmitool in the CentOS 4.5 OpenIPMI SRPM 
>>i.e. ipmitool-1.8.8-disabled-sensor.patch is the cause of this issue ... 
>>the comment is the change log is:
>>
>>- Added patch to fix sensors problems on Woodcrest (#228679)
>>
>>I guess you could rebuild the OpenIPMI without that patch
> 
> 
> Thanks for the input James.
> 
> That does seem a similar problem, but it's specific to those Intel chipsets,
> but the looks. The SC1435s we're I'm seeing the problem are AMDs.
> 
> Another interesting datapoint I've discovered is that the versions of 
> OpenIPMI only changed at the release level:
> 
>   CentOS 4.4: 1.4.14-1.4E.13
>   CentOS 4.5: 1.4.14-1.4E.17
> 
> so I'm starting to wonder if it's perhaps a kernel change.
> 
> In addition, I've now verified that the sensors are behaving similarly
> on CentOS 5.
> 

The ipmitool-1.8.8-disabled-sensor.patch may well be to fix Woodcrest 
specific issues - but it also removes part of the code that affects 
temperature readings on (some?) Dells ...

I'm not an expert on IPMI, but the code that patch did remove, looks a 
bit hacky (may be that is why it was removed?) - however, one side 
effect of this is to prevent some temperature reading on SC1435s and may 
be other Dell hardware. I have no idea if the 'real' issue is with 
ipmitool or the Dell hardware.

However, if you rebuild OpenIPMI without that patch, then ipmitool will 
work as before when reading temperatures on SC1435s

It is not a kernel issue - you get the same problem using ipmitool 
talking over the lanplus interface (which goes nowhere near the kernel).

The simple work around is to use ipmitool from the CentOS 4.4 RPM (as 
the only change to ipmitool between 4.4 and 4.5 were the Woodcrest fixes).

I have created an updated SRPM which reverses that part of the Woodcrest 
fixes that affect these Dells - if you are interested, the SRPM is at:

<ftp://ftp.moving-picture.com/private/OpenIPMI-1.4.14-1.4E.17a.src.rpm>

I haven't use CentOS 5 in anger (yet), but I guess the issue is the same.

James Pearson