Warren Young wrote:
On 8/12/2013 11:01, m.roth@5-cent.us wrote:
VERY STRONG RECOMMENDATION: DON'T buy Supermicro. They have a *lot* of trouble with this new, fuzzy concept called "quality control".
We have a *lot* of SuperMicro based systems in the field, and they aren't failing. In fact, I can't remember the last time we had to fix an actual motherboard issue. It seems like every field hardware failure for years has come down to dying HDDs.
We did once upon a time have a QC problem with SuperMicro, around Y2K, but that was because we chose to use AMD processors, and AMD OEM fan/heat sink combos at the time used little 60mm 6000 RPM pancake fans that would seize up after a few years. This was before processors had overtemp shutdown features, so once the fan seized, the processors would cook themselves.
<snip>
You'll notice that both of these failure modes are due to mechanical wear. I can't say I've *ever* seen a SuperMicro board fail in any of the solid-state components, solder joints, capacitors, etc.
Well, *all* of these are rackmount servers, with no moving-the-server wear. We start seeing userspace compute-intensive processes crashing the system a number of times a day. We have a canned package that we send to Penguin on the disk we put in, which has a generic CentOS install, and running that, the crash is repeatable. They replace the m/b, and it doesn't happen again. (Or at least with that program - we've got issues with some *other* users, with different software, that seem to be crashing it. With us, this is seriously important, since the users' jobs run for days, sometimes a week or more, on the cluster....
mark