[CentOS] Re: memorial day kernel panic {Scanned}

Tue May 27 18:18:35 UTC 2008
Ross S. W. Walker <rwalker at medallion.com>

Scott Silva wrote:
> on 5-27-2008 10:16 AM Ross S. W. Walker spake the following: 
> > sbeam wrote:
> > >	On Tuesday 27 May 2008 11:39, Scott Silva wrote:
> > > > 
> > >	> Running memtest for 24 hours should be enough to test the ram.
> > >	> A 3ware 7006 is a fairly old card. Does it have the latest bios available
> > >	> from 3ware?
> > > > 
> > >	> You could always eliminate the 3ware controller by installing a drive on
> > >	> whatever built in controller it has.
> > > 
> > > this is a production server, so running an extended memtest not going to 
> > > happen. But I can swap it out and put it in a backup system to do the test. 
> > > It's beginning to look a lot like a RAM issue as I have now seen a couple 
> > > segfaults from programs that have always run fine. Every kernel panic message 
> > > is different (crashed again 1 hour ago). Fans and case temp are nominal.
> > > 
> > > the 3ware card was just purchased last month, it has the latest firmware and 
> > > bios installed.
> > > 
> > > the memory is from PQI - supposed to be an OK brand right? it has a lifetime 
> > > warranty... heh
> > > 
> > > next steps... HA and fault-tolerant clustering, per the adjacent thread... 
> > > this is the cautionary tale come to life.
> > 
> > It would be great if there were a simple machine that you could plug
> > a bunch of dimms of varying types into and it will perform high-speed
> > tests on them continuously and flag ones that show an error.
> > 
> > Then you could test all memory modules thoroughly before putting them
> > into production servers (or any server for that matter).
> 
> That is why a good long burn in test is a worthwhile thing to 
> plan for. That is unless you need to rush a replacement 
> server out quickly.

Yes, but even then, with say 16GB or 32GB of memory it happens
that some errors just fall through the cracks.

> I usually run memtest86 for 48 hours, and then run a burn in 
> test with some load.
> 
> There are simple machines for testing memory, but they tend 
> to be very expensive and time consuming. Manufacturers can't 
> take the time to do thorough memory tests before they ship, 
> so they usually do some quick go-nogo tests and depend on 
> their warranty dept. to do the hard tests.
> 
> I don't think it would pay for anyone to buy one of these 
> testers, unless you are a very large var like Dell or HP. It 
> is easier (and probably cheaper) to just send new ram out and 
> send the returns back to your supplier for them to check.

I actually found a memory testing system for around $4K, yes
it's about the cost of a well equiped server, but if it
works well it should earn it's keep pretty quick.

It's called RAMCHECK, I priced out the DDR/DDR2 unit, but
there is add-ons for SODIMM, SDRAM, EDO, if you got it
fully loaded I suspect it would be around $5K.

Company's called Innovations http://www.memorytesters.com/

They're Government registered and CDW seems to resell it,
so it isn't completely suspect.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.