Re: [CentOS] Replacing SW RAID-1 with SSD RAID-1

24 Nov 2020

      ...
On 11/24/20 11:05 AM, Simon Matter wrote:
...
...
On 11/24/20 1:20 AM, Simon Matter wrote:
...
...
On 23/11/2020 17:16, Ralf Prengel wrote:
...
Backup!!!!!!!!
Von meinem iPhone gesendet
You do have a recent backup available anyway, haven't you? That is:
Even
without planning to replace disks. And testing such
strategies/sequences
using loopback devices is definitely a good idea to get used to the
machinery...
On a side note: I have had a fair number of drives die on me during
RAID-rebuild so I would try to avoid (if at all possible) to
deliberately reduce redundancy just for a drive swap. I have never
had
a
problem (yet) due to a problem with the RAID-1 kernel code itself.
And:
If you have to change a disk because it already has issues it may be
dangerous to do a backup - especially if you do a file based backups

because the random access pattern may make things worse. Been there,
done that...
Sure, and for large disks I even go further: don't put the whole disk
into
one RAID device but build multiple segments, like create 6 partitions
of
same size on each disk and build six RAID1s out of it.
Oh, boy, what a mess this will create! I have inherited a machine which
was set up by someone with software RAID like that. You need to replace
one drive, other RAIDs which that drive's other partitions are
participating are affected too.
Now imagine that somehow at some moment you have several RAIDs each of
them is not redundant, but in each it is partition from different drive
that is kicked out. And now you are stuck unable to remove any of
failed
drives, removal of each will trash one or another RAID (which are not
redundant already). I guess the guy who left me with this setup
listened
to advises like the one you just gave. What a pain it is to deal with
any drive failure on this machine!!
It is known since forever: The most robust setup is the simplest one.
I understand that, I also like keeping things simple (KISS).
Now, in my own experience, with these multi terabyte drives today, in
95%
of the cases where you get a problem it is with a single block which can
not be read fine. A single write to the sector makes the drive remap it
and problem is solved. That's where a simple resync of the affected RAID
segment is the fix. If a drive happens to produce such a condition once
a
year, there is absolutely no reason to replace the drive, just trigger
the
remapping of the bad sector and and drive will remember it in the
internal
bad sector map. This happens all the time without giving an error to the
OS level, as long as the drive could still read and reconstruct the
correct data.
In the 5% of cases where a drive really fails completely and needs
replacement, you have to resync the 10 RAID segments, yes. I usually do
it
with a small script and it doesn't take more than some minutes.
It is one story if you administer one home server. It is quite different
is you administer a couple of hundreds of them, like I do. And just 2-3
machines set up in such a disastrous manner as I just described suck
10-20 times more of my time each compared to any other machine - the
ones I configured hardware for myself, and set up myself, then you are
entitled to say what I said.
Your assumptions about my work environment are quite wrong.
...
Hence the attitude.
Keep things simple, so they do not suck your time - if you do it for
living.
But if it is a hobby of yours - the one that takes all your time, and
gives you a pleasure just to fiddle with it, then it's your time, and
your pleasure, do it the way to get more of it ;-)
It was a hobby 35 years ago coding in assembler and designing PCBs for
computer extensions.
Simon

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] Replacing SW RAID-1 with SSD RAID-1