[CentOS] ssacli start rebuild?

Sun Nov 15 04:23:21 UTC 2020
hw <hw at gc-24.de>

On Sat, 2020-11-14 at 14:37 -0700, Warren Young wrote:
> On Nov 14, 2020, at 5:56 AM, hw <hw at gc-24.de> wrote:
> > On Wed, 2020-11-11 at 16:38 -0700, Warren Young wrote:
> > > On Nov 11, 2020, at 2:01 PM, hw <hw at gc-24.de> wrote:
> > > > I have yet to see software RAID that doesn't kill the performance.
> > > 
> > > When was the last time you tried it?
> > 
> > I'm currently using it, and the performance sucks.
> 
> Be specific.  Give chip part numbers, drivers used, whether this is on-board software RAID or something entirely different like LVM or MD RAID, etc.  For that matter, I don’t even see that you’ve identified whether this is CentOS 6, 7 or 8.  (I hope it isn't older!)

I don't need to be specific because I have seen the difference in
practical usage over the last 20 years.  I'm not setting up
scientific testing environments that would cost tremendous amounts
of money and am using available and cost-efficient hard- and software.

> > Perhaps it's
> > not the software itself or the CPU but the on-board controllers
> > or other components being incable handling multiple disks in a
> > software raid.  That's something I can't verify.
> 
> Sure you can.  Benchmark RAID-0 vs RAID-1 in 2, 4, and 8 disk arrays.

No, I can't.  I don't have tons of different CPUs, mainboards, controller
cards and electronic diagnosting equipment around to do that, and what
would you even benchmark?  Is the user telling you that the software they
are using in a VM that is stored on an NFS server, run by another server
connected to it, is now running faster or slower?  Are you doing SQL queries
to create reports that are rarely required and take a while to run your
benchmark?  And what is even relevant?

I am seeing that a particular software running in a VM is now running not
any slower and maybe even faster than before the failed disk was replaced.
That means hardware RAID with 8 disks in hardware RAID 1+0 vs. two disks
as RAID 0 each in software RAID, using the otherwise same hardware, is
not faster and even slower than the software RAID.  The CPU load on the
storage server is also higher, which in this case does not matter.  I'm
happy with the result so far, and that is what matters.

If the disks were connected to the mainboard instead, the software might
be running slower.  I can't benchmark that, either, because I can't connect
the disks to the SATA ports on the board.  If there were 8 disks in a
RAID 1+0, all connected to the board, it might be a lot slower.  I can't
benchmark that, the board doesn't have so many SATA connectors.

I only have two new disks and no additional or different hardware.  Telling
me to specify particular chips and such is totally pointless.  Benchmarking
is not feasible and pointless, either.

Sure you can do some kind of benchmarking in a lab if you can afford it, but
how does that correlate to the results you'll be getting in practise?  Even if
you involve users, those users will be different from the users I'm dealing with.

> In a 2-disk array, a proper software RAID system should give 2x a single disk’s performance for both read and write in RAID-0, but single-disk write performance for RAID-1.
>
> Such values should scale reasonably as you add disks: RAID-0 over 8 disks gives 8x performance, RAID-1 over 8 disks gives 4x write but 8x read, etc.
> 
> These are rough numbers, but what you’re looking for are failure cases where it’s 1x a single disk for read or write.  That tells you there’s a bottleneck or serialization condition, such that you aren’t getting the parallel I/O you should be expecting.

And?

> > > Why would you expect that a modern 8-core Intel CPU would impede I/O
> > 
> > It doesn't matter what I expect.
> 
> It *does* matter if you know what the hardware’s capable of.

I can expect a hardware to do something as much as I want, it will always only do
whatever it does regardless.

> TLS is a much harder problem than XOR checksumming for traditional RAID, yet it imposes [approximately zero][1] performance penalty on modern server hardware, so if your CPU can fill a 10GE pipe with TLS, then it should have no problem dealing with the simpler calculations needed by the ~2 Gbit/sec flat-out max data rate of a typical RAID-grade 4 TB spinning HDD.
> 
> Even with 8 in parallel in the best case where they’re all reading linearly, you’re still within a small multiple of the Ethernet case, so we should still expect the software RAID stack not to become CPU-bound.
> 
> And realize that HDDs don’t fall into this max data rate case often outside of benchmarking.  Once you start throwing ~5 ms seek times into the mix, the CPU’s job becomes even easier.
> 
> [1]: https://stackoverflow.com/a/548042/142454

This may all be nice and good in theory.  In practise, I'm seeing up to 30% CPU
during a mdraid resync for a single 2-disk array.  How much performance impact
does that indicate for "normal" operations?

> > > > And where
> > > > do you get cost-efficient cards that can do JBOD?
> > > 
> > > $69, 8 SATA/SAS ports: https://www.newegg.com/p/0ZK-08UH-0GWZ1
> > 
> > That says it's for HP.  So will you still get firmware updates once
> > the warranty is expired?  Does it exclusively work with HP hardware?
> > 
> > And are these good?
> 
> You asked for “cost-efficient,” which I took to be a euphemism for “cheapest thing that could possibly work.”

Buying crap tends not to be cost-efficient.

> If you’re willing to spend money, then I fully expect you can find JBOD cards you’ll be happy with.

Like $500+ cards?  That's not cost efficient for my backup server I'm running
about once a month to put backups on it.  If I can get one good 16-port card or
two 8-port cards for max. $100, I'll consider it.  Otherwise, I can keep using
the P410s, turn all disks into RAID0 and use btrfs.

> Personally, I get servers with enough SFF-8087 SAS connectors on them to address all the disks in the system.  I haven’t bothered with add-on SATA cards in years.

How do you get all these servers?

> I use ZFS, so absolute flat-out benchmark speed isn’t my primary consideration.  Data durability and data set features matter to me far more.

Well, I tried ZFS and was not happy with it, though it does have
some nice features.

> > > > What has HP been thinking?
> > > 
> > > That the hardware vs software RAID argument is over in 2020.
> > 
> > Do you have a reference for that, like a final statement from HP?
> 
> Since I’m not posting from an hpe.com email address, I think it’s pretty obvious that that is my opinion, not an HP corporate statement.

I haven't payed attention to the email address.

> I base it on observing the Linux RAID market since the mid-90s.  The massive consolidation for hardware RAID is a big part of it.  That’s what happens when a market becomes “mature,” which is often the step just prior to “moribund.”
> 
> > Did they stop developing RAID controllers, or do they ship their
> > servers now without them
> 
> Were you under the impression that HP was trying to provide you the best possible technology for all possible use cases, rather than make money by maximizing the ratio of cash in vs cash out?
> 
> Just because they’re serving it up on a plate doesn’t mean you hafta pick up a fork.

If they had stopped making hardware RAID controllers, that would show that
they have turned away from hardware RAID, and that might be seen as putting
an end to the discussion --- *because* they are trying to make money.  If they
haven't stopped making them, that might indicate that there is still sufficient
demand for the technology, and there are probably good reasons for that.  That
different technologies have matured over time doesn't mean that others have become
bad.  Besides, always "picking up the best technology" comes with it's own
disadvantages while all technology will ultimately fail eventually, and sometimes
hardware RAID can be the "best technology".