[CentOS] storage servers crashing, hair being pulled out!

Sun Dec 20 05:11:27 UTC 2009
nate <centos at linuxpowered.net>

Gordon McLellan wrote:

> I'm really at a loss on what to do next... Any suggestions?

Run hardware diagnostics? Run a burn in test? I use this:


For burn-in. In my experience it takes less then 4 hours at
high load with this app to turn up faulty hardware. If it
does crash with this then replace the system or replace
components until the crashing stops, run it for a week, then
you can be pretty certain at least the hardware is stable.

Also noticed your using pretty poor quality components for
a storage server, promise raid? western digital "green" disks?
Not exactly server grade.

Suggest if you want stability you go with Western Digital RE3/4
disks and 3ware RAID(with a BBU so you can enable write back
caching), at least.. Seagate have high grade SATA as well, you
don't mention the model your using but I'd assume they are of
similar quality as the "green" disks, i.e. not made for servers.

Also I assume you have a decent UPS as well on all systems, never
run a computer without a UPS(well unless it's a laptop).

Did you build the systems yourself or did you buy them pre
assembled? If you did it yourself I would verify the power
supplies themselves are of decent quality and provide adequate
voltage given the number of disks your working with. While there
are plenty of good power supplies out there, the only one I will
go out of my way to put money down on is PC Power & Cooling.