On July 1, 2019 8:56:35 AM CDT, Blake Hudson blake@ispn.net wrote:
Warren Young wrote on 6/28/2019 6:53 PM:
On Jun 28, 2019, at 8:46 AM, Blake Hudson blake@ispn.net wrote:
Linux software RAID…has only decreased availability for me. This has
been due to a combination of hardware and software issues that are are generally handled well by HW RAID controllers, but are often handled poorly or unpredictably by desktop oriented hardware and Linux software.
Would you care to be more specific? I have little experience with
software RAID, other than ZFS, so I don’t know what these “issues” might be.
I've never used ZFS, as its Linux support has been historically poor. My comments are limited to mdadm. I've experienced three faults when using
Linux software raid (mdadm) on RH/RHEL/CentOS and I believe all of them
resulted in more downtime than would have been experienced without the RAID: 1) A single drive failure in a RAID4 or 5 array (desktop IDE) caused the entire system to stop responding. The result was a degraded (from the dead drive) and dirty (from the crash) array that could not be rebuilt (either of the former conditions would have been fine, but not both due to buggy Linux software). 2) A single drive failure in a RAID1 array (Supermicro SCSI) caused
the system to be unbootable. We had to update the BIOS to boot from the
working drive and possibly grub had to be repaired or reinstalled as I recall (it's been a long time). 3) A single drive failure in a RAID 4 or 5 array (desktop IDE) was not clearly identified and required a bit of troubleshooting to pinpoint which drive had failed.
Unfortunately, I've never had an experience where a drive just failed cleanly and was marked bad by Linux software RAID and could then be replaced without fanfare. This is in contrast to my HW raid experiences
where a single drive failure is almost always handled in a reliable and
predictable manner with zero downtime. Your points about having to use a clunky BIOS setup or CLI tools may be true for some controllers, as are
your points about needing to maintain a spare of your RAID controller, ongoing driver support, etc. I've found the LSI brand cards have good Linux driver support, CLI tools, an easy to navigate BIOS, and are backwards compatible with RAID sets taken from older cards so I have no
problem recommending them. LSI cards, by default, also regularly test all drives to predict failures (avoiding rebuild errors or double failures).
+1 in favor of hardware RAID.
My usual argument is: in case of hardware RAID dedicated piece of hardware runs a single task: RAID function, which boils down to simple, short, easy to debug well program. In case of software RAID there is no dedicated hardware, and if kernel (big and buggy code) is panicked, current RAID operation will never be finished which leaves the mess. One does not need computer science degree to follow this simple logic.
Valeri
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++