[CentOS] Machine check events

Tue Nov 26 14:11:23 UTC 2013
Glenn Eychaner <geychaner at mac.com>

On further, further investigation, it looks like according to the mcelog install
guide at http://www.mcelog.org/installation.html, I could "roll my own" for 32-bit
CentOS 6:

"For bad page offlining you will need a 2.6.33+ kernel or a 2.6.32 kernel with
the soft offlining capability backported (like RHEL6 or SLES11-SP1)"
"The kernel has to have CONFIG_X86_MCE enabled. For 32bit kernels you
need at least a 2.6,30 kernel."

The current kernel I am running is 2.6.32-358.23.2, but I can't tell whether it
has CONFIG_X86_MCE enabled. How can I find this out?

Thanks,
-G.

JD writes:

> yum info mcelog
> ...
> Description : mcelog is a daemon that collects and decodes Machine Check
>             : Exception data on x86-64 machines.
> 
> So not for 32-bit...

On Nov 26, 2013, at 9:25 AM, Glenn Eychaner <geychaner at mac.com> wrote:

> Further investigation seems to indicate that these events should be handled
> by "mcelog" or "mced". However, there is no /var/log/mcelog, nor do I have a
> "mcelog" or "mced" binary, nor does yum seem to contain anything related
> (based on "yum whatprovides '*/mcelog'" and similar queries).
> 
> Thus, I still don't know what to do with these errors.  Ignore them? I am
> running 32-bit CentOS 6.4 (legacy software reasons).
> 
> On Nov 25, 2013, at 11:05 AM, Glenn Eychaner <geychaner at mac.com> wrote:
> 
>> On my new Haswell-based machines, I am occasionally seeing entries like the
>> following in /var/log/messages:
>> 	kernel: [Hardware Error]: Machine check events logged
>> (I would not have even noticed them, except that they get flagged by logwatch.)
>> These messages always occur alone, and don't seem to have a corresponding
>> entry in any other log file in /var/log. How can I get more info about these
>> messages?
> 

--
Glenn Eychaner (geychaner at lco.cl)
Telescope Systems Programmer, Las Campanas Observatory