[CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

Mon Sep 8 20:46:18 UTC 2014
Valeri Galtsev <galtsev at kicp.uchicago.edu>

On Mon, September 8, 2014 2:45 pm, Keith Keller wrote:
> On 2014-09-08, Valeri Galtsev <galtsev at kicp.uchicago.edu> wrote:
>> I gave on the SiperMicro quite a while ago. Not because of BIOS, but
>> because of hardware engineering flaws. Which at least manifests itself
>> with system boards for AMD CPUs. These (AMD) boards work reliably for
>> only
>> 2-4 years, after that they die. Not all of them, but about 50% of
>> SuperMicro AMD server and mostly workstation boards (I have no
>> experience
>> with their low end desktop boards if they exist) are dead after 3-4
>> years
> Huh.  I have a bunch of SuperMicro boards with AMD CPUs, and have had
> only one die completely, and that was a DOA that I returned before
> putting into production.
> Are you saying dead-dead, like completely unusable, or sorta dead, where
> you get spurious and unexplained errors?

It begins with random occasional errors, and ends up totally dead in a
course of couple of weeks to couple of Months. You pull CPUs and RAM from
this dead one stick into another (I'm tempted to say "tyan this time" ;-),
and these work. At this point you can't flash BIOS - not in house. My
hunch is: this is engineering flaw, it looks like the board topology isn't
too good around one of the CPU sockets, so it's marginally works (without
much reserve) while system board is new, then with slight gradual
degradation of components... Maybe the ripple on the leads is below but
close to tolerable. Or capacitances and inductances [of the board leads]
involved are such. I can't offer [much] more detail on what I observed,
it's been some time since I banged my head around that. And you can
imagine how happy I was to forget about it after I gave up on them.
Anyway, there are still SiperMicro boards in our stalls which are still
kicking. So I'm not saying all of them, I just don't care to learn on my
hide which are and which are not.

Oh, BTW, all electrolytic capacitors on these strangely died SuperMicro
boards are OK. All of us have seen those around CPUs on dead system boards
(mostly manufactured during some period of time) mostly bulging and leaked
out - not in case of these strangely died boards. Some of capacitors can
loose capacitance to some extent without showing signs of anything, but
good engineering usually takes that into account, and uses to necessary
extent larger ones, so that doesn't even comes close to margin during
equipment life.


Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247