On Wednesday 07 July 2010, m.roth at 5-cent.us wrote: > Alexander Farber wrote: > > every few hours I get the following message in /var/log/message: > > Jul 5 20:23:28 hXXX kernel: Machine check events logged ... > > MCE 0 > > HARDWARE ERROR. This is *NOT* a software problem! > > Please contact your hardware vendor > > CPU 0 4 northbridge TSC 111a60c5584d4 [at 2500 Mhz 1 days 9:25:51 > > uptime (unreliable)] > > MISC c008000001000000 ADDR 1148f5940 > > Northbridge NB Array Error > > bit35 = err cpu3 > > bit42 = L3 subcache in error bit 0 > > bit43 = L3 subcache in error bit 1 > > bit46 = corrected ecc error > > bit59 = misc error valid > > memory/cache error 'generic read mem transaction, generic > > transaction, level generic' > > STATUS 9c1f4cf8001c011b MCGSTATUS 0 > > No DIMM found for 1148f5940 in SMBIOS ... > First, this is *very* bad That's a bit hard. Depending on what the actual error is that triggers this mce it may actually be just an annoyance (even though, yes, it is a hardware problem). Also the OP did mention that the servers runs without any obvious problems. > - I'm not good enough on this to tell you if > it's the CPU, or the motherboard, but it's one of the two, *not* just > memory. What do you base that on? I've seen a lot of different MCE-errors being resolved by finding and replacing flaky dimms. > Second, if you're paying for hosting, and it's *their* server, you > need to get on the phone with them *now*, and tell them that they need to > fix it, yesterday would be preferable. They *should* have seen the logs. > > Dunno if you have a physical machine hosted there, or a VM' I'm quite sure you can't get that kind of MCE-dump inside a VM. /Peter > if the latter, > they can move it without you seeing any downtime at all. If the former, > they can just hot swap the drives into another server. > > But call them *NOW*. You're paying for the service. > > mark -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: <http://lists.centos.org/pipermail/centos/attachments/20100707/535d4f16/attachment-0005.sig>