[CentOS] DegradedArray message

Fri Dec 5 00:46:12 UTC 2014
Gordon Messmer <gordon.messmer at gmail.com>

On 12/04/2014 05:45 AM, David McGuffey wrote:
> md0 is made up of two 250G disks on which the OS and a very large /var
> partions resides for a number of virtual machines.
...
> Challenge is that disk 0 of md0 is the problem and it has a 524M /boot
> partition outside of the raid partition.

Assuming that you have an unused drive port, you can fix that pretty easily.

Attach a new replacement disk to the unused port.  Let's say that it 
comes up as /dev/sde.

Copy the partition table to it (unless it's GPT, in which case use parted):
     sfdisk -d /dev/sda | sfdisk /dev/sde

Unmount /boot and copy that partition (assuming that it is sda1):
     umount /boot
     dd if=/dev/sda1 of=/dev/sde1 bs=1M

Install grub on the new drive:
     grub-install /dev/sde

At that point, you should be able to also add the new partition to the 
md array:
     mdadm /dev/md0 /dev/sda2

Once it rebuilds, shut down.  Remove the bad drive.  Put the new drive 
in its place.  In theory the system will boot and be whole.

In practice, however, there's a bunch of information you didn't provide, 
so some of those steps are wrong.

I'm not sure what dm-0, dm-2 and dm-3 are, but they're indicated in your 
mdstat.  I'm guessing that you made partitions, and then made LVM or 
crypto devices, and then did RAID on top of that.  If either of those 
are correct, that's completely the wrong way to build RAID sets.  You 
risk either bad performance from doing crypto more often than is 
required, or possibly corruption as a result of LVM not mapping blocks 
the way you expect.

If you build software RAID, I really strongly recommend that you keep it 
as simple as possible.  That means a) build sofware RAID sets from raw 
partitions and b) use as few partitions as possible.

Typically, I'll create two partitions on all disks.  The first is a 
small partition for /boot, which may be part of a RAID1 set or may be 
unused.  The second partition covers the rest of the drive and will be 
used in whatever arrangement is suitable for that system, whether it's 
RAID1, RAID5, or RAID10.  All of the drives are consistent, so there's 
always a place to copy /boot, and just one script or process to set up 
new disks regardless of their position in the array.  md0 is used for 
/boot, and md1 is an LVM PV.  All of the filesystems other than /boot 
are LVs.

Hopefully btrfs will become the default fs in the near future and all of 
this will be vastly simplified.