Warren Young wrote: > On 8/12/2013 11:01, m.roth at 5-cent.us wrote: >> VERY STRONG RECOMMENDATION: DON'T buy Supermicro. They have a *lot* of >> trouble with this new, fuzzy concept called "quality control". > > We have a *lot* of SuperMicro based systems in the field, and they > aren't failing. In fact, I can't remember the last time we had to fix > an actual motherboard issue. It seems like every field hardware failure > for years has come down to dying HDDs. > > We did once upon a time have a QC problem with SuperMicro, around Y2K, > but that was because we chose to use AMD processors, and AMD OEM > fan/heat sink combos at the time used little 60mm 6000 RPM pancake fans > that would seize up after a few years. This was before processors had > overtemp shutdown features, so once the fan seized, the processors would > cook themselves. <snip> > You'll notice that both of these failure modes are due to mechanical > wear. I can't say I've *ever* seen a SuperMicro board fail in any of > the solid-state components, solder joints, capacitors, etc. Well, *all* of these are rackmount servers, with no moving-the-server wear. We start seeing userspace compute-intensive processes crashing the system a number of times a day. We have a canned package that we send to Penguin on the disk we put in, which has a generic CentOS install, and running that, the crash is repeatable. They replace the m/b, and it doesn't happen again. (Or at least with that program - we've got issues with some *other* users, with different software, that seem to be crashing it. With us, this is seriously important, since the users' jobs run for days, sometimes a week or more, on the cluster.... mark