[CentOS] Replacing SW RAID-1 with SSD RAID-1

Tue Nov 24 16:36:45 UTC 2020
Valeri Galtsev <galtsev at kicp.uchicago.edu>


On 11/24/20 1:20 AM, Simon Matter wrote:
>> On 23/11/2020 17:16, Ralf Prengel wrote:
>>> Backup!!!!!!!!
>>>
>>> Von meinem iPhone gesendet
>>
>> You do have a recent backup available anyway, haven't you? That is: Even
>> without planning to replace disks. And testing such strategies/sequences
>> using loopback devices is definitely a good idea to get used to the
>> machinery...
>>
>> On a side note: I have had a fair number of drives die on me during
>> RAID-rebuild so I would try to avoid (if at all possible) to
>> deliberately reduce redundancy just for a drive swap. I have never had a
>> problem (yet) due to a problem with the RAID-1 kernel code itself. And:
>> If you have to change a disk because it already has issues it may be
>> dangerous to do a backup - especially if you do a file based backups -
>> because the random access pattern may make things worse. Been there,
>> done that...
> 
> Sure, and for large disks I even go further: don't put the whole disk into
> one RAID device but build multiple segments, like create 6 partitions of
> same size on each disk and build six RAID1s out of it.

Oh, boy, what a mess this will create! I have inherited a machine which 
was set up by someone with software RAID like that. You need to replace 
one drive, other RAIDs which that drive's other partitions are 
participating are affected too.

Now imagine that somehow at some moment you have several RAIDs each of 
them is not redundant, but in each it is partition from different drive 
that is kicked out. And now you are stuck unable to remove any of failed 
drives, removal of each will trash one or another RAID (which are not 
redundant already). I guess the guy who left me with this setup listened 
to advises like the one you just gave. What a pain it is to deal with 
any drive failure on this machine!!

It is known since forever: The most robust setup is the simplest one.

> So, if there is an
> issue on one disk in one segment, you don't lose redundancy of the whole
> big disk. You can even keep spare segments on separate disks to help in
> case where you can not quickly replace a broken disk. The whole handling
> is still very easy with LVM on top.
> 

One can do a lot of fancy things, splitting things on one layer, then 
joining them back on another (by introducing LVM)... But I want to 
repeat it again:

The most robust setup is the simplest one.

Valeri

> Regards,
> Simon
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
> 

-- 
++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++