[CentOS] Re: memorial day kernel panic

Tue May 27 17:03:51 UTC 2008
sbeam <sbeam at onsetcorps.net>

On Tuesday 27 May 2008 11:39, Scott Silva wrote:
> Running memtest for 24 hours should be enough to test the ram.
> A 3ware 7006 is a fairly old card. Does it have the latest bios available
> from 3ware?
> You could always eliminate the 3ware controller by installing a drive on
> whatever built in controller it has.

this is a production server, so running an extended memtest not going to 
happen. But I can swap it out and put it in a backup system to do the test. 
It's beginning to look a lot like a RAM issue as I have now seen a couple 
segfaults from programs that have always run fine. Every kernel panic message 
is different (crashed again 1 hour ago). Fans and case temp are nominal.

the 3ware card was just purchased last month, it has the latest firmware and 
bios installed.

the memory is from PQI - supposed to be an OK brand right? it has a lifetime 
warranty... heh

next steps... HA and fault-tolerant clustering, per the adjacent thread... 
this is the cautionary tale come to life.

fun fun fun