[CentOS] Re: memorial day kernel panic

Tue May 27 17:16:41 UTC 2008
Ross S. W. Walker <rwalker at medallion.com>

sbeam wrote:

> On Tuesday 27 May 2008 11:39, Scott Silva wrote:
> > Running memtest for 24 hours should be enough to test the ram.
> > A 3ware 7006 is a fairly old card. Does it have the latest bios available
> > from 3ware?
> > You could always eliminate the 3ware controller by installing a drive on
> > whatever built in controller it has.
> this is a production server, so running an extended memtest not going to 
> happen. But I can swap it out and put it in a backup system to do the test. 
> It's beginning to look a lot like a RAM issue as I have now seen a couple 
> segfaults from programs that have always run fine. Every kernel panic message 
> is different (crashed again 1 hour ago). Fans and case temp are nominal.
> the 3ware card was just purchased last month, it has the latest firmware and 
> bios installed.
> the memory is from PQI - supposed to be an OK brand right? it has a lifetime 
> warranty... heh
> next steps... HA and fault-tolerant clustering, per the adjacent thread... 
> this is the cautionary tale come to life.

It would be great if there were a simple machine that you could plug
a bunch of dimms of varying types into and it will perform high-speed
tests on them continuously and flag ones that show an error.

Then you could test all memory modules thoroughly before putting them
into production servers (or any server for that matter).


This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.