[CentOS] how to debug hardware lockups?

Thu Nov 20 08:38:49 UTC 2008
John R Pierce <pierce at hogranch.com>

Rudi Ahlers wrote:
> This is when I realized that the Q9300 CPU could be too big a
> processor for the fan that I have installed.
>
> The fan that I have, is:
> http://www.dynatron-corp.com/products/cpucooler/cpucooler_model.asp?id=165
>
> So, it looks like it's not really made for a Q9300 CPU, although their
> specs say it is
>   

that fan says its for up to 135 watt CPUs, I don't think a q9300 is 
anywhere near that, so unless that heatsink fan is grossly under its 
speced capability, I dont think thats a problem.   Yeah, the Q9300 is 
95W max, and thats with all 4 cores running heavy math...

HOWEVER.  Intel desktop boards generally have a passive heatsink on the 
northbridge and expect the downdraft from a conventional CPU fan to cool 
said northbridge.  your 1U configuration might not be moving enough air 
past that northbridge.   I know on my DG33TL in a desktop minitower, the 
G33 northbridge runs pretty hot, and I had to arrange for some extra 
airflow past it since I used a 'tower cooler' which blew the air 
sideways rather than down.

I still think running four instances of mprime (from www.mersenne.org) 
each bound to a different cpu affinity (-a0, -a1, -a2, -a3) and running 
the 'torture test' overnight will tell you a lot.   do this with xen 
disabled, just the base system running at init 3.  any sort of 
computational or memory timing related glitch will show up as a numeric 
error and be logged by the program.