[CentOS] how to debug hardware lockups?

Sat Nov 15 23:21:42 UTC 2008
Rudi Ahlers <rudiahlers at gmail.com>

On Sun, Nov 16, 2008 at 1:14 AM, John R Pierce <pierce at hogranch.com> wrote:
> Rudi Ahlers wrote:
>>
>> Well, on a standard CentOS 5.2, /var/log/messages will be the the
>> place to log problems like this, or where else can I get more info?
>>
>
> tough to write to the disk when the kernel is crashing.  ditto the network.
>   that leaves VGAs and serial ports, which can be written to by self
> contained emergency-crash routines...
>
> IIRC, you said this was a Q9something quad core... thats a desktop
> processor... does this server have ECC memory?  (I ask, because few desktop
> platforms do, while ECC is fairly standard on servers).    Without ECC, the
> system has no way of knowing it read in bad data from the ram, and if the
> bad data happens to be code and that code happens to be in the kernel,
> ka-RASH, without any detection or warning, it leaps off into never-land, and
> you get a kernel fault, almost always resulting in...
>
>   kernel panic
>   system halted
>
> with no additional useful information available.     with ECC memory, single
> bit errors get corrected on the fly, and log an ECC error event, while
> double bit errors result in a system halt with a message indicating such.
>
>


No, the motherboard doesn't support ECC RAM. The motherboard is a
Intel DG35EC - http://www.intel.com/products/desktop/motherboards/DG35EC/DG35EC-overview.htm



-- 

Kind Regards
Rudi Ahlers