[CentOS] how to replace a raid drive with mdadm

Sat May 10 23:03:29 UTC 2014
Keith Keller <kkeller at wombat.san-francisco.ca.us>

On 2014-05-10, Dennis Jacobfeuerborn <dennisml at conversis.de> wrote:
>
> This can also be inverted especially if you cannot send data to the
> drive anymore because it dies completely: Create lots of disk i/o with a
> command like "grep -nri test /usr" and all drives except the broken one
> should show activity.

That's certainly a good idea.  If you have multiple arrays you'd need to
send that IO to each array at mostly the same time, but with only one
array it's less difficult.  I think the most challenging scenario would
be if the array has multiple spares--if the array rebuilds before you
can look at it, then you have to generate IO on the array and on the
drive(s) that are still spares.

If you have no active spares (either you started with none, or you had
one and it's been used to replace the dead drive), one way to make IO
is to start a check of the md array (e.g.,
echo check > /sys/block/mdN/md/sync_action ).  The drive that doesn't
blink is the dead one.

> Another way is to write down the serial numbers of the disks, the slots
> you put the disks in and then use hdparm -I /dev/sdX to find which
> device shows which serial number. That way once sdX dies you can check
> the list to find which slot the disk for the failed device was put in.

Physical labelling in this way (or some other way) is still the best
solution, as long as you keep the list up to date (and don't screw up
the list, of course).  But it's definitely good to have multiple methods
in your toolbox--for example, you might try the IO trick, then
cross-check it against your physical labels.  Better to take some extra
time verifying which drive is dead than to pull the wrong one!

--keith

-- 
kkeller at wombat.san-francisco.ca.us