[CentOS] New kernel causes hardware error?

John R Pierce pierce at hogranch.com
Tue Jun 22 07:27:02 UTC 2010


On 06/22/10 12:21 AM, Peter Kjellstrom wrote:
> On Tuesday 22 June 2010, Eric Deis wrote:
>    
>> I have recently upgraded to 2.6.18-194.3.1.el5 and within several days
>> the machine crashed with the following error (repeating in mcelog):
>>      
> I'm guessing the old kernel just didn't notice.
>
> The below MCEs indicate bad hardware. Since the DIMMs are a lot easier to
> debug I'd suggest you start there (but it could be the systemboard too). Try
> running with half you DIMMs then the other half.
>    

and on nehalem (xeon 5500, 5600), the memory controller is in the CPUs, 
so they are suspect too.

first, however, i'd see if there's a BIOS flash upgrade for the 
mainboard.  these sometimes have microcode fixes for various specific 
Intel CPUs, and also may have updated memory timing parameters.





More information about the CentOS mailing list