On 06/22/10 12:21 AM, Peter Kjellstrom wrote: > On Tuesday 22 June 2010, Eric Deis wrote: > >> I have recently upgraded to 2.6.18-194.3.1.el5 and within several days >> the machine crashed with the following error (repeating in mcelog): >> > I'm guessing the old kernel just didn't notice. > > The below MCEs indicate bad hardware. Since the DIMMs are a lot easier to > debug I'd suggest you start there (but it could be the systemboard too). Try > running with half you DIMMs then the other half. > and on nehalem (xeon 5500, 5600), the memory controller is in the CPUs, so they are suspect too. first, however, i'd see if there's a BIOS flash upgrade for the mainboard. these sometimes have microcode fixes for various specific Intel CPUs, and also may have updated memory timing parameters.