On 11/24/20 11:05 AM, Simon Matter wrote: >> >> >> On 11/24/20 1:20 AM, Simon Matter wrote: >>>> On 23/11/2020 17:16, Ralf Prengel wrote: >>>>> Backup!!!!!!!! >>>>> >>>>> Von meinem iPhone gesendet >>>> >>>> You do have a recent backup available anyway, haven't you? That is: >>>> Even >>>> without planning to replace disks. And testing such >>>> strategies/sequences >>>> using loopback devices is definitely a good idea to get used to the >>>> machinery... >>>> >>>> On a side note: I have had a fair number of drives die on me during >>>> RAID-rebuild so I would try to avoid (if at all possible) to >>>> deliberately reduce redundancy just for a drive swap. I have never had >>>> a >>>> problem (yet) due to a problem with the RAID-1 kernel code itself. And: >>>> If you have to change a disk because it already has issues it may be >>>> dangerous to do a backup - especially if you do a file based backups - >>>> because the random access pattern may make things worse. Been there, >>>> done that... >>> >>> Sure, and for large disks I even go further: don't put the whole disk >>> into >>> one RAID device but build multiple segments, like create 6 partitions of >>> same size on each disk and build six RAID1s out of it. >> >> Oh, boy, what a mess this will create! I have inherited a machine which >> was set up by someone with software RAID like that. You need to replace >> one drive, other RAIDs which that drive's other partitions are >> participating are affected too. >> >> Now imagine that somehow at some moment you have several RAIDs each of >> them is not redundant, but in each it is partition from different drive >> that is kicked out. And now you are stuck unable to remove any of failed >> drives, removal of each will trash one or another RAID (which are not >> redundant already). I guess the guy who left me with this setup listened >> to advises like the one you just gave. What a pain it is to deal with >> any drive failure on this machine!! >> >> It is known since forever: The most robust setup is the simplest one. > > I understand that, I also like keeping things simple (KISS). > > Now, in my own experience, with these multi terabyte drives today, in 95% > of the cases where you get a problem it is with a single block which can > not be read fine. A single write to the sector makes the drive remap it > and problem is solved. That's where a simple resync of the affected RAID > segment is the fix. If a drive happens to produce such a condition once a > year, there is absolutely no reason to replace the drive, just trigger the > remapping of the bad sector and and drive will remember it in the internal > bad sector map. This happens all the time without giving an error to the > OS level, as long as the drive could still read and reconstruct the > correct data. > > In the 5% of cases where a drive really fails completely and needs > replacement, you have to resync the 10 RAID segments, yes. I usually do it > with a small script and it doesn't take more than some minutes. > It is one story if you administer one home server. It is quite different is you administer a couple of hundreds of them, like I do. And just 2-3 machines set up in such a disastrous manner as I just described suck 10-20 times more of my time each compared to any other machine - the ones I configured hardware for myself, and set up myself, then you are entitled to say what I said. Hence the attitude. Keep things simple, so they do not suck your time - if you do it for living. But if it is a hobby of yours - the one that takes all your time, and gives you a pleasure just to fiddle with it, then it's your time, and your pleasure, do it the way to get more of it ;-) Valeri >> >>> So, if there is an >>> issue on one disk in one segment, you don't lose redundancy of the whole >>> big disk. You can even keep spare segments on separate disks to help in >>> case where you can not quickly replace a broken disk. The whole handling >>> is still very easy with LVM on top. >>> >> >> One can do a lot of fancy things, splitting things on one layer, then >> joining them back on another (by introducing LVM)... But I want to >> repeat it again: >> >> The most robust setup is the simplest one. > > The good things is that LVM has been so stable for so many years that I > don't think twice about this one more layer. Why is a layered approach > worse than a fully included solution like ZFS? The tools differ but some > complexity always remains. > > That's how I see it, > Simon > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos > -- ++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++