Peter Kjellstrom wrote:
On Wednesday 07 July 2010, m.roth@5-cent.us wrote:
Alexander Farber wrote:
every few hours I get the following message in /var/log/message: Jul 5 20:23:28 hXXX kernel: Machine check events logged
...
MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge TSC 111a60c5584d4 [at 2500 Mhz 1 days 9:25:51 uptime (unreliable)] MISC c008000001000000 ADDR 1148f5940 Northbridge NB Array Error bit35 = err cpu3 bit42 = L3 subcache in error bit 0 bit43 = L3 subcache in error bit 1 bit46 = corrected ecc error bit59 = misc error valid memory/cache error 'generic read mem transaction, generic transaction, level generic' STATUS 9c1f4cf8001c011b MCGSTATUS 0 No DIMM found for 1148f5940 in SMBIOS
...
<snip>
- I'm not good enough on this to tell you if
it's the CPU, or the motherboard, but it's one of the two, *not* just memory.
What do you base that on? I've seen a lot of different MCE-errors being resolved by finding and replacing flaky dimms.
Because it says NB Array error, and errors in the L3 subcache. I've seen enough memory errors, and not seen an NB array & subcache error.
I do just note that there's "No DIMM found for ... in SMBIOS", but I assume that's just a bank that's not filled.
mark