[CentOS] Install Centos 6 x86_64 on Dell PowerEdge 2970 and aSSD (hardware probing issues)

Sun Sep 7 18:50:41 UTC 2014
Valeri Galtsev <galtsev at kicp.uchicago.edu>

On Sun, September 7, 2014 1:04 pm, Keith Keller wrote:
> On 2014-09-07, Valeri Galtsev <galtsev at kicp.uchicago.edu> wrote:
>> It doesn't sound like you are flashing all 3ware cards you have in
>> production every time new firmware release it out. It doesn't sound
>> either
>> like you had fatal failure of production box because of bug in 3ware
>> firmware. Correct me if I'm wrong, otherwise I see you on the same page
>> with me: i.e. not flashing new firmware as a part of "routine update" of
>> production machine (together with system/software updates).
> Well, I think we are on the same page now.  I think I (and some other
> folks) interpreted your posts as "if you have to flash the firmware, it
> was a crappy firmware, and you should switch vendors" which (as someone
> else noted) would soon leave you with no vendors.

Great... and my fault, I'm often a bit extreme in expressions ;-(

> To summarize, I think our page says "update the firmware only when
> necessary on production-level hardware".

Yes. Of which during last one and a half decades I had none.

> FWIW, I did have a different 3ware card eat its array, though I do
> suspect some user (i.e., me) error.  I had a 9650 card which was having
> problems with kernel panics.  I suspected a hardware failure, so I moved
> the array to another 9650 in the same box, which may not have had a BBU.
> Unfortunately that card showed worse problems a few weeks later: not
> only did it kernel panic, but it also trashed the array pretty much
> completely.  (Of course I had backups, and this was a dev box, not
> public-facing, but it was still frustrating.)  At the time the 9650 was
> old enough that the 9750 series was out, and that card has been fairly
> solid.  (Also FWIW, my last 9650 card had the same issue a few weeks
> ago; fortunately it did not eat its array.)

I guess after that I should declare myself to be lucky. None out of more
than a couple of dozens of 3ware cards ever did harm for me. I did once
had one of them fried (my clumsiness most likely), which then just didn't
come up (3ware just replaced card without a question asked). Could yours
be _slightly_ fried? If its internal RAM controller chip that is slightly
fried (if you overheat it extremely it may become less high frequency due
to impurity diffusion in the chip messing up profile - I've seen things
like that, not in 3ware though) - then the card's internal computer (doing
RAID function) will produce total garbage occasionally thus potentially
causing anything. And kernel panics with that card would be likely
sometimes, as it will occasionally talk gibberish back to the kernel. Just
a shot in the dark.


> So to add a page to our book, "always have backups even if you trust
> your hardware!"  :)
> --keith
> --
> kkeller at wombat.san-francisco.ca.us
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos

Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247