On Tue, Jun 26, 2012 at 03:03:23PM -0400, Steve Thompson wrote:
On Tue, 26 Jun 2012, m.roth@5-cent.us wrote:
We've had a number of servers fail, and it *seems* to be related to the motherboard.
I too have had bad experiences with SuperMicro motherboards; never had one last more than three years.
The problem with supermicro is that the end user assembles them; If you use ESD protection, this is fine. If you dont? go buy a dell or something.
The big problem is that many of the smaller assembly houses also don't believe ESD is a big deal. If there is carpet on the workshop floor? run. If you see techs working without a wrist strap? walk.
I've assembled hundreds of supermicro servers with and without ESD protection, and the behavior is fairly reproducable. Yeah, the problems don't always show up right away? but they come.
I remember when I first figured this out; we had been having about 1 in 3 of our supermicro servers not pass burn-in. Then, in production, we'd lose things like RAID cards and ethernet ports all the time. I'd spend days swapping out parts and RMAing stuff, just to get one server built. I mean, I didn't really believe that the factory was sending me broken shit, and there was noticable static in the office. (I always 'took the power supply pledge' before touching anything) Anyhow, I read a study by adaptec (we were using adaptec hardware raid in everything, and they were failing like crazy) saying that nearly all customer RMAs, upon inspection, were due to esd damage.
Well, the boss ended up ordering something like 70 servers (rather than the three every two weeks he was ordering before) - I talked him into letting me blow $200 on ESD protection, just to see if that was the problem, and instead of having 1 out of 3 die as before? all of them passed burn-in on the first try.
Properly assembled supermicro kit (both AMD and Intel) is just as good as the dell stuff. I have one server that's been chugging away for something like ten years now. (I need to get rid of it; Dual socket 604 xeons. It's a space heater, and it doesn't get me much by way of compute power. I've got all customers off of it, but my own personal vps? I haven't had time.)
But yeah, you've gotta get someone to assemble it that gives a shit. I mean, me? I know that it's my pager that is going off at 4am if something breaks. It's me that's going to have to fumble around with spares. I give a shit.
As it is, I'd rather assemble my own servers, than trust someone for whom a down hardware is not that big of a deal to assemble my stuff.
Assembling a superserver, if you don't fuck it up, takes about five minutes. Burn in is trivial when they pass... and when they don't pass, which is extremely rare, I know I screwed something up.
On the other hand... I have a very low opinion of dell support (granted, I'm pretty hard to please in that department.) but from what I've seen? all the big names ship okay stuff from the factory. They have proper esd precautions in the factory. So yeah; if you aren't willing to go with the table mat, the wrist strap, and the monitor, well, order the server from dell and don't open it.