[CentOS] Machine check events

Wed Nov 27 18:32:15 UTC 2013
Glenn Eychaner <geychaner at mac.com>

On further, further, further toying, I now have mcelog running on my 32-bit
CentOS 6 systems! I admit to doing it the "dumb" way: I grabbed the source
from the git repository, compiled and installed it, and THEN discovered
that the init.d file supplied with the source was not CentOS compatible, so
I grabbed the x86-64 RPM, extracted the startup files, and copied them into
place. The RPM was small enough to make this easy.

What I SHOULD have done is to grab the source RPM, replace the source with
the latest source, build and install the source RPM, and then repackage the
RPMs again for future consumption.  Maybe I will try that at a future date, but
I don't really have time today.

-G.

On Nov 26, 2013, at 11:11 AM, Glenn Eychaner <geychaner at mac.com> wrote:

> On further, further investigation, it looks like according to the mcelog install
> guide at http://www.mcelog.org/installation.html, I could "roll my own" for 32-bit
> CentOS 6:
> 
> "For bad page offlining you will need a 2.6.33+ kernel or a 2.6.32 kernel with
> the soft offlining capability backported (like RHEL6 or SLES11-SP1)"
> "The kernel has to have CONFIG_X86_MCE enabled. For 32bit kernels you
> need at least a 2.6,30 kernel."
> 
> The current kernel I am running is 2.6.32-358.23.2, but I can't tell whether it
> has CONFIG_X86_MCE enabled. How can I find this out?
> 
> JD writes:
> 
>> yum info mcelog
>> ...
>> Description : mcelog is a daemon that collects and decodes Machine Check
>>            : Exception data on x86-64 machines.
>> 
>> So not for 32-bit...
> 
> On Nov 26, 2013, at 9:25 AM, Glenn Eychaner <geychaner at mac.com> wrote:
> 
>> Further investigation seems to indicate that these events should be handled
>> by "mcelog" or "mced". However, there is no /var/log/mcelog, nor do I have a
>> "mcelog" or "mced" binary, nor does yum seem to contain anything related
>> (based on "yum whatprovides '*/mcelog'" and similar queries).
>> 
>> Thus, I still don't know what to do with these errors.  Ignore them? I am
>> running 32-bit CentOS 6.4 (legacy software reasons).
>> 
>> On Nov 25, 2013, at 11:05 AM, Glenn Eychaner <geychaner at mac.com> wrote:
>> 
>>> On my new Haswell-based machines, I am occasionally seeing entries like the
>>> following in /var/log/messages:
>>> 	kernel: [Hardware Error]: Machine check events logged
>>> (I would not have even noticed them, except that they get flagged by logwatch.)
>>> These messages always occur alone, and don't seem to have a corresponding
>>> entry in any other log file in /var/log. How can I get more info about these
>>> messages?

--
Glenn Eychaner (geychaner at lco.cl)
Telescope Systems Programmer, Las Campanas Observatory