Alexander Farber wrote: > Hello, > > every few hours I get the following message in /var/log/message: > > Jul 5 20:23:28 hXXX kernel: Machine check events logged <snip> > And in the /var/log/mcelog I see: > > MCE 0 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 0 4 northbridge TSC 111a60c5584d4 [at 2500 Mhz 1 days 9:25:51 > uptime (unreliable)] > MISC c008000001000000 ADDR 1148f5940 > Northbridge NB Array Error > bit35 = err cpu3 > bit42 = L3 subcache in error bit 0 > bit43 = L3 subcache in error bit 1 > bit46 = corrected ecc error > bit59 = misc error valid > memory/cache error 'generic read mem transaction, generic > transaction, level generic' > STATUS 9c1f4cf8001c011b MCGSTATUS 0 > No DIMM found for 1148f5940 in SMBIOS > > My machine (a CentOS 5.5/64bit server rented at German > hoster strato.de) seems to run ok as a LAMP server though... > > What do these messages actually mean, > is RAM defect and how critical is it > (because I have an important event this Friday > and would prefer not to take the machine offline) <snip> First, this is *very* bad - I'm not good enough on this to tell you if it's the CPU, or the motherboard, but it's one of the two, *not* just memory. Second, if you're paying for hosting, and it's *their* server, you need to get on the phone with them *now*, and tell them that they need to fix it, yesterday would be preferable. They *should* have seen the logs. Dunno if you have a physical machine hosted there, or a VM' if the latter, they can move it without you seeing any downtime at all. If the former, they can just hot swap the drives into another server. But call them *NOW*. You're paying for the service. mark