[CentOS] OT, hardware: HP smart array drive issue

Fri Jul 10 18:18:11 UTC 2015
Thomas Eriksson <thomas.eriksson at slac.stanford.edu>

On 07/10/2015 10:49 AM, m.roth at 5-cent.us wrote:
> Jason Warr wrote:
>> On July 10, 2015 11:47:09 AM CDT, m.roth at 5-cent.us wrote:
>>> Hi. Anyone working with these things? I've got a drive in "predictive
>>> failure" on in a RAID5. Now here's the thing: there was an issue
>>> yesterday when I got in, and I wound up power cycling the RAID;
>>> first boot of attached server had issues, and said the controller
>>> had a failure, and a drive had failed, and wouldn't continue
>>> booting; when I gave it the three-finger salute, this time on t
>>> way up, during POST, it noted the controller issue... but the
>>> thing came up, looking like it did a couple of days ago.
>>>
>>> Trying to prevent this from happening again, I've decided to replace
>>> the drive that's in predictive failure. The array has a hot spare.
>>> I tried to remove, using hpacucli, it refuses "operation not
>>> permitted", and there doesn't *seem* to be a "mark as failed"
>>> command. *Do* I just yank the drive?
>>>
>> Yep, just yank it.  It should start auto rebuilding on the spare.
>>
>> If you didn't have a spare you would pull the suspect drive and replace it
>> with one of equal or greater capacity and it would auto rebuild as well.
>>
>> I have a bunch of them at home and have been using them at work for years.
> 
> Thanks for your quick reply, Jason. I'm used to LSI/MegaRAID/PERCs, where
> you have to fail it, first. Oddity: I had the drive out for more then five
> minutes while getting it out of the sled, putting the new one in, oh, and
> dusting out the slot (gotta do that for all of them, next maintenance
> window), but after I put in the replacement, and used hpacucli to check,
> to my surprise it was rebuilding with the replacement, *not* with the
> spare.
> 

HP's raid controllers appears to have some logic that if the rebuild to
spare disk have not yet reached 50% when you insert the replacement, it
will abandon the rebuild to the spare and rebuild to the replacement
instead.

I don't have any documentation to prove it, but I have observed it
numerous of times.

	Thomas