Re: [CentOS] Machine check events

27 Nov 2013


      And all that work was done to get this, output of a corrected memory parity
error. I get about one of these per workstation per 3 days, more or less; is
this a surprising number? (The workstation under the heaviest load gets
more, while the idle spare gets none at all; no surprise there!)
MCE 6
CPU 1 BANK 0 
TIME 1385426237 Mon Nov 25 21:37:17 2013
MCG status:
MCi status:
Corrected error
Error enabled
MCA: Internal parity error
STATUS 90000040000f0005 MCGSTATUS 0
MCGCAP c09 APICID 2 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 60
Anyway,
-G.
On Nov 27, 2013, at 3:32 PM, Glenn Eychaner geychaner@mac.com wrote:
...
On further, further, further toying, I now have mcelog running on my 32-bit
CentOS 6 systems! I admit to doing it the "dumb" way: I grabbed the source
from the git repository, compiled and installed it, and THEN discovered
that the init.d file supplied with the source was not CentOS compatible, so
I grabbed the x86-64 RPM, extracted the startup files, and copied them into
place. The RPM was small enough to make this easy.
What I SHOULD have done is to grab the source RPM, replace the source with
the latest source, build and install the source RPM, and then repackage the
RPMs again for future consumption.  Maybe I will try that at a future date, but
I don't really have time today.
-G.
On Nov 26, 2013, at 11:11 AM, Glenn Eychaner geychaner@mac.com wrote:
...
On further, further investigation, it looks like according to the mcelog install
guide at http://www.mcelog.org/installation.html, I could "roll my own" for 32-bit
CentOS 6:
"For bad page offlining you will need a 2.6.33+ kernel or a 2.6.32 kernel with
the soft offlining capability backported (like RHEL6 or SLES11-SP1)"
"The kernel has to have CONFIG_X86_MCE enabled. For 32bit kernels you
need at least a 2.6,30 kernel."
The current kernel I am running is 2.6.32-358.23.2, but I can't tell whether it
has CONFIG_X86_MCE enabled. How can I find this out?
JD writes:
...
yum info mcelog
...
Description : mcelog is a daemon that collects and decodes Machine Check
          : Exception data on x86-64 machines.
So not for 32-bit...
On Nov 26, 2013, at 9:25 AM, Glenn Eychaner geychaner@mac.com wrote:
...
Further investigation seems to indicate that these events should be handled
by "mcelog" or "mced". However, there is no /var/log/mcelog, nor do I have a
"mcelog" or "mced" binary, nor does yum seem to contain anything related
(based on "yum whatprovides '*/mcelog'" and similar queries).
Thus, I still don't know what to do with these errors.  Ignore them? I am
running 32-bit CentOS 6.4 (legacy software reasons).
On Nov 25, 2013, at 11:05 AM, Glenn Eychaner geychaner@mac.com wrote:
...
On my new Haswell-based machines, I am occasionally seeing entries like the
following in /var/log/messages:
   kernel: [Hardware Error]: Machine check events logged
(I would not have even noticed them, except that they get flagged by logwatch.)
These messages always occur alone, and don't seem to have a corresponding
entry in any other log file in /var/log. How can I get more info about these
messages?
--
Glenn Eychaner (geychaner@lco.cl)
Telescope Systems Programmer, Las Campanas Observatory

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] Machine check events