[CentOS] CentOS 7 and Areca ARC-1883I SAS controller: JBOD or not to JBOD?

Fri Jan 20 23:38:10 UTC 2017
Valeri Galtsev <galtsev at kicp.uchicago.edu>

On Fri, January 20, 2017 5:16 pm, Joseph L. Casale wrote:
>> This is why before configuring and installing everything you may want to
>> attach drives one at a time, and upon boot take a note which physical
>> drive number the controller has for that drive, and definitely label it
>> so
>> y9ou will know which drive to pull when drive failure is reported.
>
> Sorry Valeri, that only works if you're the only guy in the org.

Well, this is true, I'm only one sysadmin working for two departments here...

>
> In reality, you cannot and should not rely on this given how easily it can
> change and more than likely someone won't update it.
>
> Would you walk up to a production unit in a degraded state and simply pull
> out a drive and risk a production issue? I wouldn't...

I routinely do: I just hot remove failed drive from running production
systems, and replace with good drive (take a note what I said about my job
above though). No one of our users ever notices. When I do it I usually am
only taking chance of making degraded RAID6 (with one drive failed)
degraded yet even more and become not fault tolerant, though still on line
with all data on it. But even that chance is slim given I take all
precautions when I am initially setting up the box.

>
> You need to assert the position of the drive and prepare it in the array
> controller for removal, then swap, scan, add to virtual disk then initiate
> rebuild.

Hm, not certain what process you describe. Most of my controllers are
3ware and LSI, I just pull failed drive (and I know phailed physical drive
number), put good in its place and rebuild stars right away. I have a
couple of Areca ones (I love them too!), I don't remember if I have to
manually initialize rebuild. (I'm lucky in using good drives - very
careful in choosing good ones ;-).

>
> Not to mention if it's a busy system, confirm that the IO load from the
> rebuild is not having an impact on the application. You may need to lower
> the rate.

Indeed, in 3ware configuration there is a choice of several grades of
rebuild vs IO, I usually choose slower rebuild - faster IO. If I have only
one drive failing on me during a year in a given rack, there is almost
zero chance of second drive failing during quite some time (we had heated
discussion about it once and I still stand by my opinion that drive
failures are independent events). So, my degraded RAID-6 can keep running
and even still stay redundant ("single redundant" akin RAID-5) for the
period of rebuild, even if that takes quite long.

Valeri

> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>


++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++