[CentOS] CentOS 7 and Areca ARC-1883I SAS controller: JBOD or not to JBOD?

Sat Jan 21 02:32:09 UTC 2017
Cameron Smith <cameron at networkredux.com>

I was just trying to be helpful.

*backs away slowly*

Cameron

On Fri, Jan 20, 2017 at 5:16 PM, Valeri Galtsev <galtsev at kicp.uchicago.edu>
wrote:

>
> On Fri, January 20, 2017 7:00 pm, Cameron Smith wrote:
> > Hi Valeri,
> >
> >
> > Before you pull a drive you should check to make sure that doing so
> > won't kill the whole array.
>
> Wow! What did I say to make you treat me as an ultimate idiot!? ;-) All my
> comments, at least in my own reading, we about things you need to do to
> make sure when you hot unplug bad drive it is indeed failed drive you have
> to replace.
>
> Valeri
>
> >
> > MegaCli can help you prevent a storage disaster and can let you have more
> > insight into your RAID and the status of the virtual disks and the disks
> > than make up each array.
> >
> > MegaCli will let you see the health and status of each drive. Does it
> have
> > media errors, is it in predictive failure mode, what firmware version
> does
> > it have etc. MegaCli will also let you see the status of the enclosure,
> > the
> > adapter and the virtual disks (logical disks).
> >
> > Before you pull a drive it's a good idea to properly prepare it for
> > removal
> > after confirming that it's OK to remove it.
> >
> > Here are a few commands:
> >
> > OFFLINE A DISK
> > MegaCli -PDOffline -PhysDrv[32:0] -a0
> >
> > MARK A DISK AS MISSING
> > MegaCli -pdmarkmissing -physdrv[32:0] -a0
> >
> > MARK A DISK AS PREPARED FOR REMOVAL
> > MegaCli -pdprprmv -physdrv[32:0] -a0
> >
> > Here are some easy overview commands that I run when first looking at the
> > storage on a system:
> > MegaCli -AdpAllInfo -aAll |grep -A 8 "Device Present";
> > MegaCli -PDList -aALL |grep "Firmware state";
> > MegaCli -PDList -aALL |grep "Media Error Count";
> > MegaCli -PDList -aALL |grep "Predictive Failure Count";
> > MegaCli -PDList -aALL |grep "Inquiry Data";
> > MegaCli -PDList -aALL |grep "Device Firmware Level";
> > MegaCli -PDList -aALL |grep "Drive has flagged";
> > MegaCli -PDList -aALL |grep Temperature;
> >
> >
> > I also leverage MegaCli from bash scripts on my older Dell 11Gen that I
> > run
> > in cron.hourly that check the health status of my arrays and email me if
> > there is an issue.
> >
> >
> >
> > Cameron Smith
> > Technical Operations Manager
> > Network Redux, LLC
> > Cell:   503-926-4928
> >
> > On Fri, Jan 20, 2017 at 3:38 PM, Valeri Galtsev
> > <galtsev at kicp.uchicago.edu>
> > wrote:
> >
> >>
> >> On Fri, January 20, 2017 5:16 pm, Joseph L. Casale wrote:
> >> >> This is why before configuring and installing everything you may want
> >> to
> >> >> attach drives one at a time, and upon boot take a note which physical
> >> >> drive number the controller has for that drive, and definitely label
> >> it
> >> >> so
> >> >> y9ou will know which drive to pull when drive failure is reported.
> >> >
> >> > Sorry Valeri, that only works if you're the only guy in the org.
> >>
> >> Well, this is true, I'm only one sysadmin working for two departments
> >> here...
> >>
> >> >
> >> > In reality, you cannot and should not rely on this given how easily it
> >> can
> >> > change and more than likely someone won't update it.
> >> >
> >> > Would you walk up to a production unit in a degraded state and simply
> >> pull
> >> > out a drive and risk a production issue? I wouldn't...
> >>
> >> I routinely do: I just hot remove failed drive from running production
> >> systems, and replace with good drive (take a note what I said about my
> >> job
> >> above though). No one of our users ever notices. When I do it I usually
> >> am
> >> only taking chance of making degraded RAID6 (with one drive failed)
> >> degraded yet even more and become not fault tolerant, though still on
> >> line
> >> with all data on it. But even that chance is slim given I take all
> >> precautions when I am initially setting up the box.
> >>
> >> >
> >> > You need to assert the position of the drive and prepare it in the
> >> array
> >> > controller for removal, then swap, scan, add to virtual disk then
> >> initiate
> >> > rebuild.
> >>
> >> Hm, not certain what process you describe. Most of my controllers are
> >> 3ware and LSI, I just pull failed drive (and I know phailed physical
> >> drive
> >> number), put good in its place and rebuild stars right away. I have a
> >> couple of Areca ones (I love them too!), I don't remember if I have to
> >> manually initialize rebuild. (I'm lucky in using good drives - very
> >> careful in choosing good ones ;-).
> >>
> >> >
> >> > Not to mention if it's a busy system, confirm that the IO load from
> >> the
> >> > rebuild is not having an impact on the application. You may need to
> >> lower
> >> > the rate.
> >>
> >> Indeed, in 3ware configuration there is a choice of several grades of
> >> rebuild vs IO, I usually choose slower rebuild - faster IO. If I have
> >> only
> >> one drive failing on me during a year in a given rack, there is almost
> >> zero chance of second drive failing during quite some time (we had
> >> heated
> >> discussion about it once and I still stand by my opinion that drive
> >> failures are independent events). So, my degraded RAID-6 can keep
> >> running
> >> and even still stay redundant ("single redundant" akin RAID-5) for the
> >> period of rebuild, even if that takes quite long.
> >>
> >> Valeri
> >>
> >> > _______________________________________________
> >> > CentOS mailing list
> >> > CentOS at centos.org
> >> > https://lists.centos.org/mailman/listinfo/centos
> >> >
> >>
> >>
> >> ++++++++++++++++++++++++++++++++++++++++
> >> Valeri Galtsev
> >> Sr System Administrator
> >> Department of Astronomy and Astrophysics
> >> Kavli Institute for Cosmological Physics
> >> University of Chicago
> >> Phone: 773-702-4247
> >> ++++++++++++++++++++++++++++++++++++++++
> >> _______________________________________________
> >> CentOS mailing list
> >> CentOS at centos.org
> >> https://lists.centos.org/mailman/listinfo/centos
> >>
> > _______________________________________________
> > CentOS mailing list
> > CentOS at centos.org
> > https://lists.centos.org/mailman/listinfo/centos
> >
>
>
> ++++++++++++++++++++++++++++++++++++++++
> Valeri Galtsev
> Sr System Administrator
> Department of Astronomy and Astrophysics
> Kavli Institute for Cosmological Physics
> University of Chicago
> Phone: 773-702-4247
> ++++++++++++++++++++++++++++++++++++++++
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>