Re: [CentOS] kernel: Machine check events logged

7 Jul 2010


      On Wednesday 07 July 2010, m.roth@5-cent.us wrote:
...
Alexander Farber wrote:
...
every few hours I get the following message in /var/log/message:
Jul  5 20:23:28 hXXX kernel: Machine check events logged
...
...
...
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC 111a60c5584d4 [at 2500 Mhz 1 days 9:25:51
uptime (unreliable)]
MISC c008000001000000 ADDR 1148f5940
  Northbridge NB Array Error
       bit35 = err cpu3
       bit42 = L3 subcache in error bit 0
       bit43 = L3 subcache in error bit 1
       bit46 = corrected ecc error
       bit59 = misc error valid
  memory/cache error 'generic read mem transaction, generic
transaction, level generic'
STATUS 9c1f4cf8001c011b MCGSTATUS 0
No DIMM found for 1148f5940 in SMBIOS
...
...
First, this is *very* bad
That's a bit hard. Depending on what the actual error is that triggers this 
mce it may actually be just an annoyance (even though, yes, it is a hardware 
problem). Also the OP did mention that the servers runs without any obvious 
problems.
...

I'm not good enough on this to tell you if

it's the CPU, or the motherboard, but it's one of the two, *not* just
memory.
What do you base that on? I've seen a lot of different MCE-errors being 
resolved by finding and replacing flaky dimms.
...
Second, if you're paying for hosting, and it's *their* server, you 
need to get on the phone with them *now*, and tell them that they need to
fix it, yesterday would be preferable. They *should* have seen the logs.
Dunno if you have a physical machine hosted there, or a VM'
I'm quite sure you can't get that kind of MCE-dump inside a VM.
/Peter
...
if the latter, 
they can move it without you seeing any downtime at all. If the former,
they can just hot swap the drives into another server.
But call them *NOW*. You're paying for the service.
    mark

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] kernel: Machine check events logged