On Tuesday 27 May 2008 11:39, Scott Silva wrote:
Running memtest for 24 hours should be enough to test the ram. A 3ware 7006 is a fairly old card. Does it have the latest bios available from 3ware? You could always eliminate the 3ware controller by installing a drive on whatever built in controller it has.
this is a production server, so running an extended memtest not going to happen. But I can swap it out and put it in a backup system to do the test. It's beginning to look a lot like a RAM issue as I have now seen a couple segfaults from programs that have always run fine. Every kernel panic message is different (crashed again 1 hour ago). Fans and case temp are nominal.
the 3ware card was just purchased last month, it has the latest firmware and bios installed.
the memory is from PQI - supposed to be an OK brand right? it has a lifetime warranty... heh
next steps... HA and fault-tolerant clustering, per the adjacent thread... this is the cautionary tale come to life.
fun fun fun Sam