On 2014-05-10, CS_DBA cs_dba@consistentstate.com wrote:
If we loose a drive in a raid 10 array (mdadm software raid) what are the steps needed to correctly do the following:
- identify which physical drive it is
This is controller dependent. Some support blinking the drive light to identify it, others do not. If yours does not you need to jury-rig something (e.g., either physically label the drive slot/drive, or send some dummy data to the drive to get it to blink).
- replace the drive
The md part is easy. If md hasn't failed the drive already, then you need to do that first:
mdadm /dev/mdN --fail /dev/sdXX
Then remove it from the array:
mdadm /dev/mdN --remove /dev/sdXX
The physical part is, again, hardware dependent.
- add the new drive to the array and force it to re-sync
Again, physical part hardware dependent. Once the kernel knows about your new drive, this should work (partition the drive if needed beforehand):
mdadm /dev/mdN --add /dev/sdYY
There may be extra parameters for replacing a failed RAID10 drive, but I suspect that md already knows the needed parameters, so just adding the drive should kick off a rebuild of the failed member.