[CentOS] how to debug hardware lockups?

John R Pierce pierce at hogranch.com
Sat Nov 15 23:14:57 UTC 2008

Rudi Ahlers wrote:
> Well, on a standard CentOS 5.2, /var/log/messages will be the the
> place to log problems like this, or where else can I get more info?

tough to write to the disk when the kernel is crashing.  ditto the 
network.   that leaves VGAs and serial ports, which can be written to by 
self contained emergency-crash routines...

IIRC, you said this was a Q9something quad core... thats a desktop 
processor... does this server have ECC memory?  (I ask, because few 
desktop platforms do, while ECC is fairly standard on servers).    
Without ECC, the system has no way of knowing it read in bad data from 
the ram, and if the bad data happens to be code and that code happens to 
be in the kernel, ka-RASH, without any detection or warning, it leaps 
off into never-land, and you get a kernel fault, almost always resulting 

    kernel panic
    system halted

with no additional useful information available.     with ECC memory, 
single bit errors get corrected on the fly, and log an ECC error event, 
while double bit errors result in a system halt with a message 
indicating such.

