Re: [CentOS] kernel: Machine check events logged

7 Jul 2010


      Alexander Farber wrote:
...
Hello,
every few hours I get the following message in /var/log/message:
Jul  5 20:23:28 hXXX kernel: Machine check events logged
<snip>
...
And in the /var/log/mcelog I see:
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC 111a60c5584d4 [at 2500 Mhz 1 days 9:25:51
uptime (unreliable)]
MISC c008000001000000 ADDR 1148f5940
  Northbridge NB Array Error
       bit35 = err cpu3
       bit42 = L3 subcache in error bit 0
       bit43 = L3 subcache in error bit 1
       bit46 = corrected ecc error
       bit59 = misc error valid
  memory/cache error 'generic read mem transaction, generic
transaction, level generic'
STATUS 9c1f4cf8001c011b MCGSTATUS 0
No DIMM found for 1148f5940 in SMBIOS
My machine (a CentOS 5.5/64bit server rented at German
hoster strato.de) seems to run ok as a LAMP server though...
What do these messages actually mean,
is RAM defect and how critical is it
(because I have an important event this Friday
and would prefer not to take the machine offline)
<snip>
First, this is *very* bad - I'm not good enough on this to tell you if
it's the CPU, or the motherboard, but it's one of the two, *not* just
memory. Second, if you're paying for hosting, and it's *their* server, you
need to get on the phone with them *now*, and tell them that they need to
fix it, yesterday would be preferable. They *should* have seen the logs.
Dunno if you have a physical machine hosted there, or a VM' if the latter,
they can move it without you seeing any downtime at all. If the former,
they can just hot swap the drives into another server.
But call them *NOW*. You're paying for the service.
mark

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] kernel: Machine check events logged