[CentOS] ssacli start rebuild?

Wed Nov 11 06:27:36 UTC 2020
hw <hw at gc-24.de>

On Mon, 2020-11-09 at 16:30 +0100, Thomas Bendler wrote:
> Am Fr., 6. Nov. 2020 um 20:38 Uhr schrieb hw <hw at gc-24.de>:
> 
> > [...]
> > Some search results indicate that it's possible that other disks in the
> > array have read errors and might prevent rebuilding for RAID 5.  I don't
> > know if there are read errors, and if it's read errors, I think it would
> > mean that these errors would have to affect just the disk which is
> > mirroring the disk that failed, this being a RAID 1+0.  But if the RAID
> > is striped across all the disks, that could be any or all of them.
> > 
> > The array is still in production and still works, so it should just
> > rebuild.
> > Now the plan is to use another 8TB disk once it arrives, make a new RAID 1
> > with the two new disks and copy the data over.  The remaining 4TB disks can
> > then be used to make a new array.
> > 
> > Learn from this that it can be a bad idea to use a RAID 0 for backups and
> > that
> > least one generation of backups must be on redundant storage ...
> > 
> 
> Just checked on one of my HP boxes, you can indeed not figure out if one of
> the discs has read errors. Do you have the option to reboot the box and
> check on the controller directly?
> 

Thanks!  The controller (it's BIOS) doesn't show up during boot, so I can't
check there for errors.

The controller is extremely finicky:  The plan to make a RAID 1 from the two
new drives has failed because the array with the failed drive is unusable
when the failed is missing entirely.

In the process of moving the 8TB drives back and forth, it turned out that
when an array that was made from them is missing one drive, that array is
unusable --- and when putting the missing drive is put back in, the array
remains 'Ready for Rebuild' without the rebuild starting.  There is also no
way to delete an array that is missing a drive.

So the theory that the array isn't being rebuilt because other disks have
errors is likely wrong.  That means that whenenver a disk fails and is
being replaced, there is no way to rebuild the array (unless it would happen
automatically, which it doesn't).

With this experience, these controllers are now deprecated.  RAID controllers
that can't rebuild an array after a disk has failed and has been replaced
are virtually useless.