Re: [CentOS] Cant find out MCE reason (CPU 35 BANK 8)

22 Mar 2011


      Vladimir Budnev wrote:
...
2011/3/21 m.roth@5-cent.us
...
Vladimir Budnev wrote:
...
Hello community.
We are running, Centos 4.8 on SuperMicro SYS-6026T-3RF with 2xIntel
Xeon E5630 and 8xKingston KVR1333D3D4R9S/4G
For some time we have lots of MCE in mcelog and we cant find out the
reason.
The only thing that shows there (when it shows, since sometimes it
doesn't seem to) is a hardware error. You *WILL* be replacing hardware,
sometime
...
...
soon, like yesterday.
<snip>
...
...
Bad news: you have *two* DIMMs failing, one associated with the physical
CPU that has core 53, and another associated with the physical CPU that
has cores 32-35.
<snip>
...
Last night we'v made some research to find out which RAM modules bugged.
To be noticed we have 8 modules 4G each.
<snip>
...
Finally we'v placed last 2 modules...and no errors. It should be noticed
that at that step we have exactly the same modules placement as before
experiment.
Sounds strange, but at first glance looks like smthg was wrong with
modules placement. But we cant realise why the problem didnt show for
the first
...
days, even month of server running. Noone touched server HW, so i have no
idea what was that.
Now we are just waiting will there be errors again.
I'm sure there will. Reseating the memory may have done something, but
there will, I'll wager.
Here's a question out of left field: who was the manufacturer of the 4G
DIMMs? Not Supermicro, but the DIMMs themselves?
mark

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] Cant find out MCE reason (CPU 35 BANK 8)