On 08/12/13 18:16, Warren Young wrote: > On 8/12/2013 12:54, m.roth at 5-cent.us wrote: >> >> Well, *all* of these are rackmount servers, with no moving-the-server >> wear. > > Our servers are all rack-mounted, too, and pretty much never get moved > after being installed. > > In any case, I was referring to wear in the electromechanical components > of a server. HDDs and fans, primarily. In olden days, optical disks, > too. These are expected to fail over time. > >> We start seeing userspace compute-intensive processes crashing the >> system a number of times a day. > > Define "crash the system". > The whole system reboots. <snip> > I don't suppose you've gathered continuous temp data, say with Cacti? No, I haven't. It's a thought, thought the HVACs good (too good, he says, when he needs a long sleeved shirt, and sometimes a sweater). ipmitool sel list isn't showing a problem. > >> They replace the m/b, and it doesn't happen again. Oh, except for the one or two that we sent back a *second* time, and they replaced the m/b again.... > > Okay, so either this one motherboard product from Supermicro has a QC > problem, or Penguin has an application or design problem with it. Or, > your environment is somehow pushing them past their design limits. > (e.g. insufficient cooling) That's certainly not the problem. > > You're painting with far too broad a brush here to say Supermicro is > bad, period. You like them, fine. We really don't, and the only thing that we were buying that had their m/b, etc, were honkin' hot severs. mark