Scott Silva wrote:
on 5-27-2008 10:16 AM Ross S. W. Walker spake the following:
sbeam wrote:
On Tuesday 27 May 2008 11:39, Scott Silva wrote:
Running memtest for 24 hours should be enough to test the ram. A 3ware 7006 is a fairly old card. Does it have the latest bios available from 3ware?
You could always eliminate the 3ware controller by installing a drive on whatever built in controller it has.
this is a production server, so running an extended memtest not going to happen. But I can swap it out and put it in a backup system to do the test. It's beginning to look a lot like a RAM issue as I have now seen a couple segfaults from programs that have always run fine. Every kernel panic message is different (crashed again 1 hour ago). Fans and case temp are nominal.
the 3ware card was just purchased last month, it has the latest firmware and bios installed.
the memory is from PQI - supposed to be an OK brand right? it has a lifetime warranty... heh
next steps... HA and fault-tolerant clustering, per the adjacent thread... this is the cautionary tale come to life.
It would be great if there were a simple machine that you could plug a bunch of dimms of varying types into and it will perform high-speed tests on them continuously and flag ones that show an error.
Then you could test all memory modules thoroughly before putting them into production servers (or any server for that matter).
That is why a good long burn in test is a worthwhile thing to plan for. That is unless you need to rush a replacement server out quickly.
Yes, but even then, with say 16GB or 32GB of memory it happens that some errors just fall through the cracks.
I usually run memtest86 for 48 hours, and then run a burn in test with some load.
There are simple machines for testing memory, but they tend to be very expensive and time consuming. Manufacturers can't take the time to do thorough memory tests before they ship, so they usually do some quick go-nogo tests and depend on their warranty dept. to do the hard tests.
I don't think it would pay for anyone to buy one of these testers, unless you are a very large var like Dell or HP. It is easier (and probably cheaper) to just send new ram out and send the returns back to your supplier for them to check.
I actually found a memory testing system for around $4K, yes it's about the cost of a well equiped server, but if it works well it should earn it's keep pretty quick.
It's called RAMCHECK, I priced out the DDR/DDR2 unit, but there is add-ons for SODIMM, SDRAM, EDO, if you got it fully loaded I suspect it would be around $5K.
Company's called Innovations http://www.memorytesters.com/
They're Government registered and CDW seems to resell it, so it isn't completely suspect.
-Ross
______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.