Replacing SW RAID-1 with SSD RAID-1

Hi,

I want to replace my hard drives based SW RAID-1 with SSD's.

What would be the recommended procedure? Can I just remove one drive, replace with SSD and rebuild, then repeat with the other drive?

I suggest to "mdadm --fail" one drive, then "mdadm --remove" it. After replacing the drive you can "mdadm --add" it.

If you boot from the drives you also have to care for the boot loader. I guess this depends on how exactly the system is configured.

Regards, Simon

Frank Bures

9:19 p.m.

On 11/23/20 10:46 AM, Simon Matter wrote:

...

...
Hi,

I want to replace my hard drives based SW RAID-1 with SSD's.

What would be the recommended procedure? Can I just remove one drive, replace with SSD and rebuild, then repeat with the other drive?

I suggest to "mdadm --fail" one drive, then "mdadm --remove" it. After replacing the drive you can "mdadm --add" it.

If you boot from the drives you also have to care for the boot loader. I guess this depends on how exactly the system is configured.

Thanks, that's what I had in mind. Of course, I will rebuild grab2 after each iteration.

Thanks Fra

-- listfrank1@gmail.com

Phil Perry

9:34 p.m.

On 23/11/2020 15:49, Frank Bures wrote:

...

On 11/23/20 10:46 AM, Simon Matter wrote:

...
...
Hi,

I want to replace my hard drives based SW RAID-1 with SSD's.

What would be the recommended procedure? Can I just remove one drive, replace with SSD and rebuild, then repeat with the other drive?

I suggest to "mdadm --fail" one drive, then "mdadm --remove" it. After replacing the drive you can "mdadm --add" it.

If you boot from the drives you also have to care for the boot loader. I guess this depends on how exactly the system is configured.

Thanks, that's what I had in mind. Of course, I will rebuild grab2 after each iteration.

Thanks Fra

You could also grow the array to add in the new devices before removing the old HDDs ensuring you retain at least 2 devices in the array at any one time. For example, in an existing raid of sda1 and sdb1, add in sdc1 before removing sda1 and add sdd1 before removing sdb1, finally shrinking the array back to 2 devices:

mdadm --grow /dev/md127 --level=1 --raid-devices=3 --add /dev/sdc1 mdadm --fail /dev/md127 /dev/sda1 mdadm --remove /dev/md127 /dev/sda1 mdadm /dev/md127 --add /dev/sdd1 mdadm --fail /dev/md127 /dev/sdb1 mdadm --remove /dev/md127 /dev/sdb1 mdadm --grow /dev/md127 --raid-devices=2

then reinstall grub to sdc and sdd once everything has fully sync'd:

blockdev --flushbufs /dev/sdc1 blockdev --flushbufs /dev/sdd1 grub2-install --recheck /dev/sdc grub2-install --recheck /dev/sdd

centos＠niob.at

9:39 p.m.

On 23/11/2020 16:49, Frank Bures wrote:

...

On 11/23/20 10:46 AM, Simon Matter wrote:

...
...
Hi,

I want to replace my hard drives based SW RAID-1 with SSD's.

What would be the recommended procedure? Can I just remove one drive, replace with SSD and rebuild, then repeat with the other drive?

I suggest to "mdadm --fail" one drive, then "mdadm --remove" it. After replacing the drive you can "mdadm --add" it.

If you boot from the drives you also have to care for the boot loader. I guess this depends on how exactly the system is configured.

If you can the new disks while the original 2 disks are still available then grow, add, wait, fail, remove, shrink. That way you will never loose redundancy...

# grow and add new disk

mdadm --grow -n 3 /dev/mdX -a /dev/...

# wait for rebuild of the array

mdadm --wait /dev/mdX

# fail old disk

mdadm --fail /dev/sdY

# remove old disk

mdadm /dev/mdX --remove /dev/sdY

# add second disk

mdadm /dev/mdX --add /dev/...

# wait

mdadm --wait /dev/mdX

# fail and remove old disk

mdadm --fail /dev/sdZ

mdadm /dev/mdX --remove /dev/sdZ

# shrink

mdadm --grow -n 2 /dev/mdX

peter

Ralf Prengel

9:46 p.m.

Backup!!!!!!!!

Von meinem iPhone gesendet

...

Am 23.11.2020 um 17:10 schrieb centos@niob.at:

On 23/11/2020 16:49, Frank Bures wrote:

...
On 11/23/20 10:46 AM, Simon Matter wrote:

...
...
Hi,

I want to replace my hard drives based SW RAID-1 with SSD's.

What would be the recommended procedure? Can I just remove one drive, replace with SSD and rebuild, then repeat with the other drive?

I suggest to "mdadm --fail" one drive, then "mdadm --remove" it. After replacing the drive you can "mdadm --add" it.

If you boot from the drives you also have to care for the boot loader. I guess this depends on how exactly the system is configured.

If you can the new disks while the original 2 disks are still available then grow, add, wait, fail, remove, shrink. That way you will never loose redundancy...

# grow and add new disk

mdadm --grow -n 3 /dev/mdX -a /dev/...

# wait for rebuild of the array

mdadm --wait /dev/mdX

# fail old disk

mdadm --fail /dev/sdY

# remove old disk

mdadm /dev/mdX --remove /dev/sdY

# add second disk

mdadm /dev/mdX --add /dev/...

# wait

mdadm --wait /dev/mdX

# fail and remove old disk

mdadm --fail /dev/sdZ

mdadm /dev/mdX --remove /dev/sdZ

# shrink

mdadm --grow -n 2 /dev/mdX

peter _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

centos＠niob.at

11:42 p.m.

On 23/11/2020 17:16, Ralf Prengel wrote:

...

Backup!!!!!!!!

Von meinem iPhone gesendet

You do have a recent backup available anyway, haven't you? That is: Even without planning to replace disks. And testing such strategies/sequences using loopback devices is definitely a good idea to get used to the machinery...

On a side note: I have had a fair number of drives die on me during RAID-rebuild so I would try to avoid (if at all possible) to deliberately reduce redundancy just for a drive swap. I have never had a problem (yet) due to a problem with the RAID-1 kernel code itself. And: If you have to change a disk because it already has issues it may be dangerous to do a backup - especially if you do a file based backups - because the random access pattern may make things worse. Been there, done that...

peter

...

...
Am 23.11.2020 um 17:10 schrieb centos@niob.at:

On 23/11/2020 16:49, Frank Bures wrote:

...
On 11/23/20 10:46 AM, Simon Matter wrote:

...
...
Hi,

I want to replace my hard drives based SW RAID-1 with SSD's.

What would be the recommended procedure? Can I just remove one drive, replace with SSD and rebuild, then repeat with the other drive?

I suggest to "mdadm --fail" one drive, then "mdadm --remove" it. After replacing the drive you can "mdadm --add" it.

If you boot from the drives you also have to care for the boot loader. I guess this depends on how exactly the system is configured.

If you can the new disks while the original 2 disks are still available then grow, add, wait, fail, remove, shrink. That way you will never loose redundancy...

# grow and add new disk

mdadm --grow -n 3 /dev/mdX -a /dev/...

# wait for rebuild of the array

mdadm --wait /dev/mdX

# fail old disk

mdadm --fail /dev/sdY

# remove old disk

mdadm /dev/mdX --remove /dev/sdY

# add second disk

mdadm /dev/mdX --add /dev/...

# wait

mdadm --wait /dev/mdX

# fail and remove old disk

mdadm --fail /dev/sdZ

mdadm /dev/mdX --remove /dev/sdZ

# shrink

mdadm --grow -n 2 /dev/mdX

Simon Matter

24 Nov 24 Nov

12:50 p.m.

...

On 23/11/2020 17:16, Ralf Prengel wrote:

...
Backup!!!!!!!!

Von meinem iPhone gesendet

You do have a recent backup available anyway, haven't you? That is: Even without planning to replace disks. And testing such strategies/sequences using loopback devices is definitely a good idea to get used to the machinery...

On a side note: I have had a fair number of drives die on me during RAID-rebuild so I would try to avoid (if at all possible) to deliberately reduce redundancy just for a drive swap. I have never had a problem (yet) due to a problem with the RAID-1 kernel code itself. And: If you have to change a disk because it already has issues it may be dangerous to do a backup - especially if you do a file based backups - because the random access pattern may make things worse. Been there, done that...

Sure, and for large disks I even go further: don't put the whole disk into one RAID device but build multiple segments, like create 6 partitions of same size on each disk and build six RAID1s out of it. So, if there is an issue on one disk in one segment, you don't lose redundancy of the whole big disk. You can even keep spare segments on separate disks to help in case where you can not quickly replace a broken disk. The whole handling is still very easy with LVM on top.

Regards, Simon

Valeri Galtsev

10:06 p.m.

On 11/24/20 1:20 AM, Simon Matter wrote:

...

...
On 23/11/2020 17:16, Ralf Prengel wrote:

...
Backup!!!!!!!!

Von meinem iPhone gesendet

You do have a recent backup available anyway, haven't you? That is: Even without planning to replace disks. And testing such strategies/sequences using loopback devices is definitely a good idea to get used to the machinery...

On a side note: I have had a fair number of drives die on me during RAID-rebuild so I would try to avoid (if at all possible) to deliberately reduce redundancy just for a drive swap. I have never had a problem (yet) due to a problem with the RAID-1 kernel code itself. And: If you have to change a disk because it already has issues it may be dangerous to do a backup - especially if you do a file based backups - because the random access pattern may make things worse. Been there, done that...

Sure, and for large disks I even go further: don't put the whole disk into one RAID device but build multiple segments, like create 6 partitions of same size on each disk and build six RAID1s out of it.

Oh, boy, what a mess this will create! I have inherited a machine which was set up by someone with software RAID like that. You need to replace one drive, other RAIDs which that drive's other partitions are participating are affected too.

Now imagine that somehow at some moment you have several RAIDs each of them is not redundant, but in each it is partition from different drive that is kicked out. And now you are stuck unable to remove any of failed drives, removal of each will trash one or another RAID (which are not redundant already). I guess the guy who left me with this setup listened to advises like the one you just gave. What a pain it is to deal with any drive failure on this machine!!

It is known since forever: The most robust setup is the simplest one.

...

So, if there is an issue on one disk in one segment, you don't lose redundancy of the whole big disk. You can even keep spare segments on separate disks to help in case where you can not quickly replace a broken disk. The whole handling is still very easy with LVM on top.

One can do a lot of fancy things, splitting things on one layer, then joining them back on another (by introducing LVM)... But I want to repeat it again:

The most robust setup is the simplest one.

Valeri

...

Regards, Simon

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

-- ++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++

Simon Matter

10:35 p.m.

...

On 11/24/20 1:20 AM, Simon Matter wrote:

...
...
On 23/11/2020 17:16, Ralf Prengel wrote:

...
Backup!!!!!!!!

Von meinem iPhone gesendet

You do have a recent backup available anyway, haven't you? That is: Even without planning to replace disks. And testing such strategies/sequences using loopback devices is definitely a good idea to get used to the machinery...

On a side note: I have had a fair number of drives die on me during RAID-rebuild so I would try to avoid (if at all possible) to deliberately reduce redundancy just for a drive swap. I have never had a problem (yet) due to a problem with the RAID-1 kernel code itself. And: If you have to change a disk because it already has issues it may be dangerous to do a backup - especially if you do a file based backups - because the random access pattern may make things worse. Been there, done that...

Sure, and for large disks I even go further: don't put the whole disk into one RAID device but build multiple segments, like create 6 partitions of same size on each disk and build six RAID1s out of it.

Oh, boy, what a mess this will create! I have inherited a machine which was set up by someone with software RAID like that. You need to replace one drive, other RAIDs which that drive's other partitions are participating are affected too.

Now imagine that somehow at some moment you have several RAIDs each of them is not redundant, but in each it is partition from different drive that is kicked out. And now you are stuck unable to remove any of failed drives, removal of each will trash one or another RAID (which are not redundant already). I guess the guy who left me with this setup listened to advises like the one you just gave. What a pain it is to deal with any drive failure on this machine!!

It is known since forever: The most robust setup is the simplest one.

I understand that, I also like keeping things simple (KISS).

Now, in my own experience, with these multi terabyte drives today, in 95% of the cases where you get a problem it is with a single block which can not be read fine. A single write to the sector makes the drive remap it and problem is solved. That's where a simple resync of the affected RAID segment is the fix. If a drive happens to produce such a condition once a year, there is absolutely no reason to replace the drive, just trigger the remapping of the bad sector and and drive will remember it in the internal bad sector map. This happens all the time without giving an error to the OS level, as long as the drive could still read and reconstruct the correct data.

In the 5% of cases where a drive really fails completely and needs replacement, you have to resync the 10 RAID segments, yes. I usually do it with a small script and it doesn't take more than some minutes.

...

...
So, if there is an issue on one disk in one segment, you don't lose redundancy of the whole big disk. You can even keep spare segments on separate disks to help in case where you can not quickly replace a broken disk. The whole handling is still very easy with LVM on top.

One can do a lot of fancy things, splitting things on one layer, then joining them back on another (by introducing LVM)... But I want to repeat it again:

The most robust setup is the simplest one.

The good things is that LVM has been so stable for so many years that I don't think twice about this one more layer. Why is a layered approach worse than a fully included solution like ZFS? The tools differ but some complexity always remains.

That's how I see it, Simon

Valeri Galtsev

10:50 p.m.

On 11/24/20 11:05 AM, Simon Matter wrote:

...

...
On 11/24/20 1:20 AM, Simon Matter wrote:

...
...
On 23/11/2020 17:16, Ralf Prengel wrote:

...
Backup!!!!!!!!

Von meinem iPhone gesendet

You do have a recent backup available anyway, haven't you? That is: Even without planning to replace disks. And testing such strategies/sequences using loopback devices is definitely a good idea to get used to the machinery...

On a side note: I have had a fair number of drives die on me during RAID-rebuild so I would try to avoid (if at all possible) to deliberately reduce redundancy just for a drive swap. I have never had a problem (yet) due to a problem with the RAID-1 kernel code itself. And: If you have to change a disk because it already has issues it may be dangerous to do a backup - especially if you do a file based backups - because the random access pattern may make things worse. Been there, done that...

Sure, and for large disks I even go further: don't put the whole disk into one RAID device but build multiple segments, like create 6 partitions of same size on each disk and build six RAID1s out of it.

Oh, boy, what a mess this will create! I have inherited a machine which was set up by someone with software RAID like that. You need to replace one drive, other RAIDs which that drive's other partitions are participating are affected too.

Now imagine that somehow at some moment you have several RAIDs each of them is not redundant, but in each it is partition from different drive that is kicked out. And now you are stuck unable to remove any of failed drives, removal of each will trash one or another RAID (which are not redundant already). I guess the guy who left me with this setup listened to advises like the one you just gave. What a pain it is to deal with any drive failure on this machine!!

It is known since forever: The most robust setup is the simplest one.

I understand that, I also like keeping things simple (KISS).

Now, in my own experience, with these multi terabyte drives today, in 95% of the cases where you get a problem it is with a single block which can not be read fine. A single write to the sector makes the drive remap it and problem is solved. That's where a simple resync of the affected RAID segment is the fix. If a drive happens to produce such a condition once a year, there is absolutely no reason to replace the drive, just trigger the remapping of the bad sector and and drive will remember it in the internal bad sector map. This happens all the time without giving an error to the OS level, as long as the drive could still read and reconstruct the correct data.

In the 5% of cases where a drive really fails completely and needs replacement, you have to resync the 10 RAID segments, yes. I usually do it with a small script and it doesn't take more than some minutes.

It is one story if you administer one home server. It is quite different is you administer a couple of hundreds of them, like I do. And just 2-3 machines set up in such a disastrous manner as I just described suck 10-20 times more of my time each compared to any other machine - the ones I configured hardware for myself, and set up myself, then you are entitled to say what I said.

Hence the attitude.

Keep things simple, so they do not suck your time - if you do it for living.

But if it is a hobby of yours - the one that takes all your time, and gives you a pleasure just to fiddle with it, then it's your time, and your pleasure, do it the way to get more of it ;-)

Valeri

...

...
...
So, if there is an issue on one disk in one segment, you don't lose redundancy of the whole big disk. You can even keep spare segments on separate disks to help in case where you can not quickly replace a broken disk. The whole handling is still very easy with LVM on top.

One can do a lot of fancy things, splitting things on one layer, then joining them back on another (by introducing LVM)... But I want to repeat it again:

The most robust setup is the simplest one.

The good things is that LVM has been so stable for so many years that I don't think twice about this one more layer. Why is a layered approach worse than a fully included solution like ZFS? The tools differ but some complexity always remains.

That's how I see it, Simon

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

John Pierce

11:02 p.m.

zpool create newpool mirror sdb sdc mirror sdd sde mirror sdf sdg mirror sdh sdi spare sdj sdk zfs create -o mountpoint=/var/lib/pgsql-11 newpool/postgres11

and done.

centos＠niob.at

11:13 p.m.

On 24/11/2020 18:32, John Pierce wrote:

...

zpool create newpool mirror sdb sdc mirror sdd sde mirror sdf sdg mirror sdh sdi spare sdj sdk zfs create -o mountpoint=/var/lib/pgsql-11 newpool/postgres11

and done.

This *might* be a valid answer if zfs was supported on plain CentOS... (and if the question hadn't involved an existing RAID ;-) ). Or did I miss something?

peter

Warren Young

11:18 p.m.

On Nov 24, 2020, at 10:43 AM, centos@niob.at wrote:

...

On 24/11/2020 18:32, John Pierce wrote:

...
zpool create newpool mirror sdb sdc mirror sdd sde mirror sdf sdg mirror sdh sdi spare sdj sdk zfs create -o mountpoint=/var/lib/pgsql-11 newpool/postgres11

This *might* be a valid answer if zfs was supported on plain CentOS...

Since we’re talking about CentOS, “support” here must mean community support, as opposed to commercial support, so:

https://openzfs.github.io/openzfs-docs/Getting%20Started/RHEL%20and%20CentOS...

Simon Matter

25 Nov 25 Nov

12:14 a.m.

...

On 11/24/20 11:05 AM, Simon Matter wrote:

...
...
On 11/24/20 1:20 AM, Simon Matter wrote:

...
...
On 23/11/2020 17:16, Ralf Prengel wrote:

...
Backup!!!!!!!!

Von meinem iPhone gesendet

You do have a recent backup available anyway, haven't you? That is: Even without planning to replace disks. And testing such strategies/sequences using loopback devices is definitely a good idea to get used to the machinery...

On a side note: I have had a fair number of drives die on me during RAID-rebuild so I would try to avoid (if at all possible) to deliberately reduce redundancy just for a drive swap. I have never had a problem (yet) due to a problem with the RAID-1 kernel code itself. And: If you have to change a disk because it already has issues it may be dangerous to do a backup - especially if you do a file based backups

because the random access pattern may make things worse. Been there, done that...

Sure, and for large disks I even go further: don't put the whole disk into one RAID device but build multiple segments, like create 6 partitions of same size on each disk and build six RAID1s out of it.

Oh, boy, what a mess this will create! I have inherited a machine which was set up by someone with software RAID like that. You need to replace one drive, other RAIDs which that drive's other partitions are participating are affected too.

Now imagine that somehow at some moment you have several RAIDs each of them is not redundant, but in each it is partition from different drive that is kicked out. And now you are stuck unable to remove any of failed drives, removal of each will trash one or another RAID (which are not redundant already). I guess the guy who left me with this setup listened to advises like the one you just gave. What a pain it is to deal with any drive failure on this machine!!

It is known since forever: The most robust setup is the simplest one.

I understand that, I also like keeping things simple (KISS).

Now, in my own experience, with these multi terabyte drives today, in 95% of the cases where you get a problem it is with a single block which can not be read fine. A single write to the sector makes the drive remap it and problem is solved. That's where a simple resync of the affected RAID segment is the fix. If a drive happens to produce such a condition once a year, there is absolutely no reason to replace the drive, just trigger the remapping of the bad sector and and drive will remember it in the internal bad sector map. This happens all the time without giving an error to the OS level, as long as the drive could still read and reconstruct the correct data.

In the 5% of cases where a drive really fails completely and needs replacement, you have to resync the 10 RAID segments, yes. I usually do it with a small script and it doesn't take more than some minutes.

It is one story if you administer one home server. It is quite different is you administer a couple of hundreds of them, like I do. And just 2-3 machines set up in such a disastrous manner as I just described suck 10-20 times more of my time each compared to any other machine - the ones I configured hardware for myself, and set up myself, then you are entitled to say what I said.

Your assumptions about my work environment are quite wrong.

...

Hence the attitude.

Keep things simple, so they do not suck your time - if you do it for living.

But if it is a hobby of yours - the one that takes all your time, and gives you a pleasure just to fiddle with it, then it's your time, and your pleasure, do it the way to get more of it ;-)

It was a hobby 35 years ago coding in assembler and designing PCBs for computer extensions.

Simon

Valeri Galtsev

12:45 a.m.

On 11/24/20 12:44 PM, Simon Matter wrote:

...

...
On 11/24/20 11:05 AM, Simon Matter wrote:

...
...
On 11/24/20 1:20 AM, Simon Matter wrote:

...
...
On 23/11/2020 17:16, Ralf Prengel wrote: > Backup!!!!!!!! > > Von meinem iPhone gesendet

You do have a recent backup available anyway, haven't you? That is: Even without planning to replace disks. And testing such strategies/sequences using loopback devices is definitely a good idea to get used to the machinery...

On a side note: I have had a fair number of drives die on me during RAID-rebuild so I would try to avoid (if at all possible) to deliberately reduce redundancy just for a drive swap. I have never had a problem (yet) due to a problem with the RAID-1 kernel code itself. And: If you have to change a disk because it already has issues it may be dangerous to do a backup - especially if you do a file based backups

because the random access pattern may make things worse. Been there, done that...

Sure, and for large disks I even go further: don't put the whole disk into one RAID device but build multiple segments, like create 6 partitions of same size on each disk and build six RAID1s out of it.

Oh, boy, what a mess this will create! I have inherited a machine which was set up by someone with software RAID like that. You need to replace one drive, other RAIDs which that drive's other partitions are participating are affected too.

Now imagine that somehow at some moment you have several RAIDs each of them is not redundant, but in each it is partition from different drive that is kicked out. And now you are stuck unable to remove any of failed drives, removal of each will trash one or another RAID (which are not redundant already). I guess the guy who left me with this setup listened to advises like the one you just gave. What a pain it is to deal with any drive failure on this machine!!

It is known since forever: The most robust setup is the simplest one.

I understand that, I also like keeping things simple (KISS).

Now, in my own experience, with these multi terabyte drives today, in 95% of the cases where you get a problem it is with a single block which can not be read fine. A single write to the sector makes the drive remap it and problem is solved. That's where a simple resync of the affected RAID segment is the fix. If a drive happens to produce such a condition once a year, there is absolutely no reason to replace the drive, just trigger the remapping of the bad sector and and drive will remember it in the internal bad sector map. This happens all the time without giving an error to the OS level, as long as the drive could still read and reconstruct the correct data.

In the 5% of cases where a drive really fails completely and needs replacement, you have to resync the 10 RAID segments, yes. I usually do it with a small script and it doesn't take more than some minutes.

It is one story if you administer one home server. It is quite different is you administer a couple of hundreds of them, like I do. And just 2-3 machines set up in such a disastrous manner as I just described suck 10-20 times more of my time each compared to any other machine - the ones I configured hardware for myself, and set up myself, then you are entitled to say what I said.

Your assumptions about my work environment are quite wrong.

Great, then you are much mightier than I am in managing fast something set up very sophisticated way. That is amazing: managing sophisticated things the same fast as managing simple straightforward things ;-)

I also noticed one more sophistication you do: you always strip off the name of the poster you reply to. ;-)

...

...
Hence the attitude.

Keep things simple, so they do not suck your time - if you do it for living.

But if it is a hobby of yours - the one that takes all your time, and gives you a pleasure just to fiddle with it, then it's your time, and your pleasure, do it the way to get more of it ;-)

It was a hobby 35 years ago coding in assembler and designing PCBs for computer extensions.

Oh, great, we are the same of a kind. I did design electronics, and made PCBs both as hobby and for living. And I still do it as a hobby. I also did programming both as hobby and for living. The funniest was: I wrote for single board Z-80 processor based computer: assembler, disassembler, and emulator (that emilated what that Z-80 will do running some program). I did it on Wang 2200 (actually replica of such), and I programmed it, believe it or not, in Basic. That was the only language available for us on that machine. The ugly simple interpretive language with all variables global...

But now I'm sysadmin. And - for me at least - the simplest possible setup is the one that will be most robust. And it will be the easiest and fastest to maintain (both for me or for someone else if one steps in to do it instead of me).

Valeri

...

Simon

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Warren Young

12:48 a.m.

On Nov 24, 2020, at 10:05 AM, Simon Matter simon.matter@invoca.ch wrote:

...

Why is a layered approach worse than a fully included solution like ZFS?

Just one reason is that you lose visibility of lower-level elements from the top level.

You gave the example of a bad block in a RAID. What current RHEL type systems can’t tell you when that happens is which file is affected.

ZFS not only can tell you that, deleting or replacing the file will fix the array. That’s the bottom-most layer (disk surface) telling the top-most layer (userspace) there’s a problem, and user-space fixing it by telling the bottom-most layer to check again.

Because ZFS is CoW, this isn’t forcing the drive to rewrite that sector, it’s a new set of sectors being brought into use, and the old ones released. The sector isn’t retried until the filesystem reassigns those sectors.

Red Hat is attempting to fix all this with Stratis, but it’s looking to take years and years for them to get there. ZFS is ready today.

...

The tools differ but some complexity always remains.

In my experience, ZFS hides a lot of complexity, and it is exceedingly rare to need to take a peek behind the curtains.

(And if you do, there’s the zdb command.)

Jonathan Billings

1:15 a.m.

On Tue, Nov 24, 2020 at 12:18:57PM -0700, Warren Young wrote:

...

ZFS is ready today.

I disagree.

It is ready today only if you are willing to abandon Linux entirely and switch to BSD, or run a Linux distro like Ubuntu that is possibly violating a license. 3rd-party repositories that use dkms can be dangerous for a storage service, and I'd prefer to keep compilers out of my servers.

I'm not willing to move away from CentOS and am ethically bound not to violate the GPL. I would say that unless the ZFS project can fix their license, then it would be ready for Linux.

At least with Stratis, there's an attempt to work within the Linux world. I'm excited to see Fedora making btrfs as the default root filesystem, too.

-- Jonathan Billings billings@negate.org

Stephen John Smoogen

24 Nov 24 Nov

10:20 p.m.

On Tue, 24 Nov 2020 at 02:20, Simon Matter simon.matter@invoca.ch wrote:

...

...
On 23/11/2020 17:16, Ralf Prengel wrote:

...
Backup!!!!!!!!

Von meinem iPhone gesendet

You do have a recent backup available anyway, haven't you? That is: Even without planning to replace disks. And testing such strategies/sequences using loopback devices is definitely a good idea to get used to the machinery...

On a side note: I have had a fair number of drives die on me during RAID-rebuild so I would try to avoid (if at all possible) to deliberately reduce redundancy just for a drive swap. I have never had a problem (yet) due to a problem with the RAID-1 kernel code itself. And: If you have to change a disk because it already has issues it may be dangerous to do a backup - especially if you do a file based backups - because the random access pattern may make things worse. Been there, done that...

Sure, and for large disks I even go further: don't put the whole disk into one RAID device but build multiple segments, like create 6 partitions of same size on each disk and build six RAID1s out of it. So, if there is an issue on one disk in one segment, you don't lose redundancy of the whole big disk. You can even keep spare segments on separate disks to help in case where you can not quickly replace a broken disk. The whole handling is still very easy with LVM on top.

I used to do something like this (but because there isn't enough detail in the above I am not sure if we are talking the same thing). On older disks having RAID split over 4 disks with / /var /usr /home allowed for longer redundancy because drive 1 could have a 'failed' /usr but drive 0,2,3,4 were ok and the rest all worked n full mode because /, /var, /home/, were all good. This was because most of the data on /usr would be in a straight run on each disk. The problem is that a lot of modern disks do not guarantee that data for any partition will be really next to each other on the disk. Even before SSD's did this for wear leveling a lot of disks did this because it was easier to allow the full OS which runs in the Arm chip on the drive do all the 'map this sector the user wants to this sector on the disk' in whatever logic makes sense for the type of magnetic media inside. There is also a lot of silent rewriting going on the disks with the real capacity of a drive can be 10-20% bigger with those sectors slowly used as failures in other areas happen. When you start seeing errors, it means that the drive has no longer any safe sectors and probably has written /usr all over the disk in order to try to keep it going as long as it could.. the rest of the partitions will start failing very quickly afterwards.

Not all disks do this but a good many of them do from commercial SAS to commodity SATA.. and a lot of the 'Red' and 'Black' NAS drives are doing this also..

While I still use partition segments to spread things out, I do not do so for failure handling anymore. And if what I was doing isn't what the original poster was meaning I look forward to learning it.

...

Regards, Simon

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

-- Stephen J Smoogen.

Simon Matter

10:46 p.m.

...

On Tue, 24 Nov 2020 at 02:20, Simon Matter simon.matter@invoca.ch wrote:

...
...
On 23/11/2020 17:16, Ralf Prengel wrote:

...
Backup!!!!!!!!

Von meinem iPhone gesendet

You do have a recent backup available anyway, haven't you? That is:

Even

...
without planning to replace disks. And testing such

strategies/sequences

...
using loopback devices is definitely a good idea to get used to the machinery...

On a side note: I have had a fair number of drives die on me during RAID-rebuild so I would try to avoid (if at all possible) to deliberately reduce redundancy just for a drive swap. I have never had

a

...
problem (yet) due to a problem with the RAID-1 kernel code itself.

And:

...
If you have to change a disk because it already has issues it may be dangerous to do a backup - especially if you do a file based backups - because the random access pattern may make things worse. Been there, done that...

Sure, and for large disks I even go further: don't put the whole disk into one RAID device but build multiple segments, like create 6 partitions of same size on each disk and build six RAID1s out of it. So, if there is an issue on one disk in one segment, you don't lose redundancy of the whole big disk. You can even keep spare segments on separate disks to help in case where you can not quickly replace a broken disk. The whole handling is still very easy with LVM on top.

I used to do something like this (but because there isn't enough detail in the above I am not sure if we are talking the same thing). On older disks having RAID split over 4 disks with / /var /usr /home allowed for longer redundancy because drive 1 could have a 'failed' /usr but drive 0,2,3,4 were ok and the rest all worked n full mode because /, /var, /home/, were all good. This was because most of the data on /usr would be in a straight run on each disk. The problem is that a lot of modern disks do not guarantee that data for any partition will be really next to each other on the disk. Even before SSD's did this for wear leveling a lot of disks did this because it was easier to allow the full OS which runs in the Arm chip on the drive do all the 'map this sector the user wants to this sector on the disk' in whatever logic makes sense for the type of magnetic media inside. There is also a lot of silent rewriting going on the disks with the real capacity of a drive can be 10-20% bigger with those sectors slowly used as failures in other areas happen. When you start seeing errors, it means that the drive has no longer any safe sectors and probably has written /usr all over the disk in order to try to keep it going as long as it could.. the rest of the partitions will start failing very quickly afterwards.

Not all disks do this but a good many of them do from commercial SAS to commodity SATA.. and a lot of the 'Red' and 'Black' NAS drives are doing this also..

While I still use partition segments to spread things out, I do not do so for failure handling anymore. And if what I was doing isn't what the original poster was meaning I look forward to learning it.

I don't do it the same way on every system. But, on large multi TB system with 4+ drives, doing segmented raid has helped very often. There is one more thing: I always try to keep spare segments. Now, when a problem hows up, the first thing is to pvmove the broken raid data to wherever there is free space. One command and some minutes later the system is again fully redundant. LVM is really nice for such things as you can move filesystems around as long as they share the same VG. I also use LVM to optimize storage by moving things to faster or slower disks after adding storage or replacing it.

Regards, Simon

Roberto Ragusa

26 Nov 26 Nov

3:23 p.m.

On 11/24/20 8:20 AM, Simon Matter wrote:

...

Sure, and for large disks I even go further: don't put the whole disk into one RAID device but build multiple segments, like create 6 partitions of same size on each disk and build six RAID1s out of it. So, if there is an issue on one disk in one segment, you don't lose redundancy of the whole big disk. You can even keep spare segments on separate disks to help in case where you can not quickly replace a broken disk. The whole handling is still very easy with LVM on top.

Same setup I've been using for 15 years at least. Just have a standard partition size and keep using that (or multiple of that, e.g. 256GiB, then 512GiB, than 1024MiB), so to keep numbers down.

Best regards.

-- Roberto Ragusa mail at robertoragusa.it

Simon Matter

7:05 p.m.

...

On 11/24/20 8:20 AM, Simon Matter wrote:

...
Sure, and for large disks I even go further: don't put the whole disk into one RAID device but build multiple segments, like create 6 partitions of same size on each disk and build six RAID1s out of it. So, if there is an issue on one disk in one segment, you don't lose redundancy of the whole big disk. You can even keep spare segments on separate disks to help in case where you can not quickly replace a broken disk. The whole handling is still very easy with LVM on top.

Same setup I've been using for 15 years at least. Just have a standard partition size and keep using that (or multiple of that, e.g. 256GiB, then 512GiB, than 1024MiB), so to keep numbers down.

Thanks for sharing! Interesting to hear that some people did the same or similar things as I did without knowing from each other.

IIRC initially I started to do this when I got a server with different disk sizes and different paths to the disks. Think of some 18G disks, some 36G, some 73G and also some 146G. Now, if you have to make the storage redundant for disk failures and also for single path failures, you get creative how to cut the larger disks into slices and spread the mirror pairs over the paths.

It proved to be quite flexible in the end and still allowed the extension of the storage without any downtime. Needless to say that the expensive hardware RAID controllers have been removed from the box and replaced by simple SCSI controllers - because the hardware just couldn't do what was required here.

Regards, Simon

Kenneth Porter

24 Nov 24 Nov

10:13 p.m.

--On Monday, November 23, 2020 4:46 PM +0100 Simon Matter simon.matter@invoca.ch wrote:

...

I suggest to "mdadm --fail" one drive, then "mdadm --remove" it. After replacing the drive you can "mdadm --add" it.

Does it make sense to dd or ddrescue from the removed drive to the replacement? My md RAID set is on primary partitions, not raw drives, so I'm assuming the replacement drive needs at least the boot sector from the old drive to copy the partition data.

Simon Matter

10:36 p.m.

...

--On Monday, November 23, 2020 4:46 PM +0100 Simon Matter simon.matter@invoca.ch wrote:

...
I suggest to "mdadm --fail" one drive, then "mdadm --remove" it. After replacing the drive you can "mdadm --add" it.

Does it make sense to dd or ddrescue from the removed drive to the replacement? My md RAID set is on primary partitions, not raw drives, so I'm assuming the replacement drive needs at least the boot sector from the old drive to copy the partition data.

I usually dd the first mb to the new disk if it's used for booting, yes.

Simon

1696

Age (days ago)

1699

Last active (days ago)

discuss@lists.centos.org

23 comments

12 participants

tags (0)

participants (12)

centos＠niob.at
Frank Bures
John Pierce
Jonathan Billings
Kenneth Porter
Phil Perry
Ralf Prengel
Roberto Ragusa
Simon Matter
Stephen John Smoogen
Valeri Galtsev
Warren Young