[CentOS] C7, mdadm issues

Wed Jan 30 20:23:48 UTC 2019
mark <m.roth at 5-cent.us>

Alessandro Baggi wrote:
> Il 30/01/19 16:49, Simon Matter ha scritto:
>>> On 01/30/19 03:45, Alessandro Baggi wrote:
>> I also don't have much experience with spare handling as I also don't
>> do it in my scenarios.
>> However in general, I think the problem today is this:
>> We have very large disks these days. Defects on a disk are often not
>> found for a long time. Even with raid-check, I think it doesn't find
>> errors which only happen while writing, not while reading only.
>> So now, if one disk fails, things are still okay. Then, when a spare is
>> in place or the defective disk was replaced, the resync starts. Now, if
>> there is any error on one of the old disks while the resync happens,
>> boom, the array fails and is in a bad state now.
>> One more hint for those interested:
>> Even with RAID1, I don't use the whole disk as one big RAID1. Instead, I
>>  slice it into equally sized parts - not physically :-) - and create
>> multiple smaller RAID1 arrays on it. If a disk is 8TB, I create 8
>> paritions of 1TB and then create 8 RAID1 arrays on it. Then I add all 8
>>  arrays to the same VG. Now, if there is a small error in, say, disk 3,
>>  only a 1TB slice of the whole 8TB is degraded. In large arrays you can
>>  even keep some spare slices on a spare disk to temporary move broken
>> slices. You get the idea, right?
> About this type of configuration if you have 2 disks and create 8 raid1
> on this two disks, you won't lose performances? As you said if in a single
> partition you got some bad error you save other data but if one disk fails
> totally you had the same problem more you need to recreate 8 partition,
> resync 8 raid1. This could require more time to recovery and possibly more
> human error.
Not anything I can do. We have users with terabytes of data. We *need*
large RAIDS. RAID 1 for root, sure, but nothing else. This specific RAID
was unusual, for this user. Normally, for the last five or six or so
years, we do RAID 6.

Should I mentioned the RAID 6 we have that's 153TB, with 27% full?