[CentOS] raid 5 install

Fri Jun 28 23:53:07 UTC 2019
Warren Young <warren at etr-usa.com>

On Jun 28, 2019, at 8:46 AM, Blake Hudson <blake at ispn.net> wrote:
> Linux software RAID…has only decreased availability for me. This has been due to a combination of hardware and software issues that are are generally handled well by HW RAID controllers, but are often handled poorly or unpredictably by desktop oriented hardware and Linux software.

Would you care to be more specific?  I have little experience with software RAID, other than ZFS, so I don’t know what these “issues” might be.

I do have a lot of experience with hardware RAID, and the grass isn’t very green on that side of the fence, either.  Some of this will repeat others’ points, but it’s worth repeating, since it means they’re not alone in their pain:

0. Hardware RAID is a product of the time it was produced.  My old parallel IDE and SCSI RAID cards are useless because you can’t get disks with that port type any more; my oldest SATA and SAS RAID cards can’t talk to disks bigger than 2 TB; and of those older hardware RAID cards that still do work, they won’t accept a RAID created by a controller of another type, even if it’s from the same company.  (Try attaching a 3ware 8000-series RAID to a 3ware 9000-series card, for example.)

Typical software RAID never drops backwards compatibility.  You can always attach an old array to new hardware.  Or even new arrays to old hardware, within the limitations of the hardware, and those limitations aren’t the software RAID’s fault.

1. Hardware RAID requires hardware-specific utilities.  Many hardware RAID systems don’t work under Linux at all, and of of those that do, not all provide sufficiently useful Linux-side utilities.  If you have to reboot into the RAID BIOS to fix anything, that’s bad for availability.

2. The number of hardware RAID options is going down over time.  Adaptec’s almost out of the game, 3ware was bought by LSI and then had their products all but discontinued, and most of the other options you list are rebadged LSI or Adaptec.  Eventually it’s going to be LSI or software RAID, and then LSI will probably get out of the game, too.  This market segment is dying because software RAID no longer has any practical limitations that hardware can fix.

3. When you do get good-enough Linux-side utilities, they’re often not well-designed.  I don’t know anyone who likes the megaraid or megacli64 utilities.  I have more experience with 3ware’s tw_cli, and I never developed facility with it beyond pidgin, so that to do anything even slightly uncommon, I have to go back to the manual to piece the command together, else risk roaching the still-working disks.

By contrast, I find the zfs and zpool commands well-designed and easy to use.  There’s no mystery why that should be so: hardware RAID companies have their expertise in hardware, not software.  Also, “man zpool” doesn’t suck. :)

That coin does have an obverse face, which is that young software RAID systems go through a phase where they have to re-learn just how false, untrustworthy, unreliable, duplicitous, and mendacious the underlying hardware can be.  But that expertise builds up over time, so that a mature software RAID system copes quite well with the underlying hardware’s failings.

The inverse expertise in software design doesn’t build up on the hardware RAID side.  I assume this is because they fire the software teams once they’ve produced a minimum viable product, then re-hire a new team when their old utilities and monitoring software gets so creaky that it has to be rebuilt from scratch.  Then you get a *new* bag of ugliness in the world.

Software RAID systems, by contrast, evolve continuously, and so usually tend towards perfection.

The same problem *can* come up in the software RAID world: witness how much wheel reinvention is going on in the Stratis project!  The same amount of effort put into ZFS would have been a better use of everyone’s time.

That option doesn’t even exist on the hardware RAID side, though.  Every hardware RAID provider must develop their command line utilities and monitoring software de novo, because even if the Other Company open-sourced its software, that other software can’t work with their proprietary hardware.

4. Because hardware RAID is abstracted below the OS layer, the OS and filesystem have no way to interact intelligently with it.

ZFS is at the pinnacle of this technology here, but CentOS is finally starting to get this through Stratis and the extensions Stratis has required to XFS and LVM.  I assume btrfs also provides some of these benefits, though that’s on track to becoming off-topic here.

ZFS can tell you which file is affected by a block that’s bad across enough disks that redundancy can’t fix it.  This gives you a new, efficient, recovery option: restore that file from backup or delete it, allowing the underlying filesystem to rewrite the bad block on all disks.  With hardware RAID, fixing this requires picking one disk as the “real” copy and telling the RAID card to blindly rewrite all the other copies.

Another example is resilvering: because a hardware RAID has no knowledge of the filesystem, a resilver during disk replacement requires rewriting the entire disk, which takes 8-12 hours these days.  If the volume has a lot of free space, a filesystem-aware software RAID resilver can copy only the blocks containing user data, greatly reducing recovery time.

Anecdotally, I can tell you that the ECCs involved in NAS-grade SATA hardware aren’t good enough on their own.  We had a ZFS server that would detect about 4-10 kB of bad data on one disk in the pool during every weekend scrub.  We never figured out whether the problem was in the disk, its drive cage slot, or its cabling, but it was utterly repeatable.  But also utterly unimportant to diagnose, because ZFS kept fixing the problem for us, automatically!

The thing is, we’d have never known about this underlying hardware fault if ZFS’s 128-bit checksums weren’t able to reduce the chances of undetected error to practically-impossible levels.  Since ZFS knows, by those same 128-bit hashes, which copy of the data is uncorrupted, it fixed it automatically for us each time for years on end.  I doubt any hardware RAID system you favor would have fared as well.

*That’s* uptime. :)

5. Hardware RAID made sense back when a PC motherboard rarely had more than 2 hard disk controller ports, and those were shared a single IDE lane.  In those days, CPUs were slow enough that calculating parity was really costly, and hard drives were small enough that 8+ disk arrays were often required just to get enough space.

Now that you can get 10+ SATA ports on a mobo, parity calculation costs only a tiny slice of a single core in your multicore CPU, and a mirrored pair of multi-terabyte disks is often plenty of space, hardware RAID is increasingly being pushed to the margins of the server world.

Software RAID doesn’t have port count limits at all.  With hardware RAID, I don’t buy a 4-port card when a 2-port card will do, because that costs me $100-200 more.  With software RAID, I can usually find another place to plug in a drive temporarily, and that port was “free” because it came with the PC.

This matters when I have to replace a disk in my hardware RAID mirror, because now I’m out of ports.  I have to choose one of the disks to drop out of the array, losing all redundancy before the recovery even starts, because I need to free up one of the two hardware connectors for the new disk.

That’s fine when the disk I’m replacing is dead, dead, dead, but that isn’t usually the case in my experience.  Instead, the disk I’m replacing is merely *dying*, and I’m hoping to get it replaced before it finally dies.

What that means in practice is that with software RAID, I can have an internal mirror, then temporarily connect a replacement drive in a USB or Thunderbolt disk enclosure.  Now the resilver operation proceeds with both original disks available, so that if we find that the “good” disk in the original mirror has a bad sector, too, the software RAID system might find that it can pull a good copy from the “bad” disk, saving the whole operation.

Only once the resilver is complete do I have to choose which disk to drop out of the array in a software RAID system.  If I choose incorrectly, the software RAID stops work and lets me choose again.

With hardware RAID, if I choose incorrectly, it’s on the front end of the operation instead, so I’ll end up spending 8-12 hours to create a redundant copy of “Wrong!”

Bottom line: I will not shed a tear when my last hardware RAID goes away.