On Sun, September 7, 2014 1:04 pm, Keith Keller wrote:
On 2014-09-07, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
It doesn't sound like you are flashing all 3ware cards you have in production every time new firmware release it out. It doesn't sound either like you had fatal failure of production box because of bug in 3ware firmware. Correct me if I'm wrong, otherwise I see you on the same page with me: i.e. not flashing new firmware as a part of "routine update" of production machine (together with system/software updates).
Well, I think we are on the same page now. I think I (and some other folks) interpreted your posts as "if you have to flash the firmware, it was a crappy firmware, and you should switch vendors" which (as someone else noted) would soon leave you with no vendors.
Great... and my fault, I'm often a bit extreme in expressions ;-(
To summarize, I think our page says "update the firmware only when necessary on production-level hardware".
Yes. Of which during last one and a half decades I had none.
FWIW, I did have a different 3ware card eat its array, though I do suspect some user (i.e., me) error. I had a 9650 card which was having problems with kernel panics. I suspected a hardware failure, so I moved the array to another 9650 in the same box, which may not have had a BBU. Unfortunately that card showed worse problems a few weeks later: not only did it kernel panic, but it also trashed the array pretty much completely. (Of course I had backups, and this was a dev box, not public-facing, but it was still frustrating.) At the time the 9650 was old enough that the 9750 series was out, and that card has been fairly solid. (Also FWIW, my last 9650 card had the same issue a few weeks ago; fortunately it did not eat its array.)
I guess after that I should declare myself to be lucky. None out of more than a couple of dozens of 3ware cards ever did harm for me. I did once had one of them fried (my clumsiness most likely), which then just didn't come up (3ware just replaced card without a question asked). Could yours be _slightly_ fried? If its internal RAM controller chip that is slightly fried (if you overheat it extremely it may become less high frequency due to impurity diffusion in the chip messing up profile - I've seen things like that, not in 3ware though) - then the card's internal computer (doing RAID function) will produce total garbage occasionally thus potentially causing anything. And kernel panics with that card would be likely sometimes, as it will occasionally talk gibberish back to the kernel. Just a shot in the dark.
Valeri
So to add a page to our book, "always have backups even if you trust your hardware!" :)
--keith
-- kkeller@wombat.san-francisco.ca.us
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++