On 02/02/2011 09:00 AM, Lamar Owen wrote:
>
> On Wednesday, February 02, 2011 02:06:15 am Chuck Munro wrote:
>> > The real key is to carefully label each SATA cable and its associated
>> > drive. Then the little mapping script can be used to identify the
>> > faulty drive which mdadm reports by its device name. It just occurred
>> > to me that whenever mdadm sends an email report, it can also run a
>> > script which groks out the path info and puts it in the email message.
>> > Problem solved:-)
> Ok, perhaps I'm dense, but, if this is not a hot-swap bay you're talking about, wouldn't it be easier to have the drive's serial number (or other identifier found on the label) pulled into the e-mail, and compare with the label physically found on the drive, since you're going to have to open the case anyway? Using something like:
>
> hdparm -I $DEVICE | grep Serial.Number
>
> works here (the regexp Serial.Number matches the string "Serial Number" without requiring the double quotes....). Use whatever $DEVICE you need to use, as long as it's on a controller compatible with hdparm usage.
>
> I have seen cases with a different Linux distribution where the actual module load order was nondeterministic (modules loaded in parallel); while upstream and the CentOS rebuild try to make things more deterministic, wouldn't it be safer to get a really unique, externally visible identifier from the drive? If the drive has failed to the degree that it won't respond to the query, then query all the good drives in the array for their serial numbers, and use a process of elimination. This, IMO, is more robust than relying on the drive detect order to remain deterministic.
>
> If in a hotswap or coldswap bay, do some data access to the array, and see which LED's don't blink; that should correspond to the failed drive. If the bay has secondary LED's, you might be able to blink those, too.
>
>
Well no, you're not being dense. It's a case of making the best of what
the physical hardware can do for me. In my case, the drives are
segregated into several 3-drive bays which are bolted into the case
individually, so removing each one to compare serial numbers would be a
major pain, since I'd have to unbolt a bay and remove each drive one at
a time to read the label.
The use of the new RHEL-6/CentOS-6 'udevadm' command nicely maps out the
hardware path no matter the order the drives are detected/named, and
since hardware paths are fixed, I just have to attach a little tag to
each SATA cable with that path number on it. One thing I did was reboot
the machine *many* times to make sure the controller cards were always
enumerated by Linux in the same slot order.
I also notice that the RHEL-6 DriveInfo GUI application shows which
drive is giving trouble, but it only maps the controllers in a vague way
with respect to the hardware path. (At least that's what I remember
seeing a couple of days ago, I could be mistaken.)
On this particular machine I don't have the luxury of per-drive LED
activity indicators, so whacking each drive with a big read won't point
the way (but I have used that technique on other machines). I didn't
have the funds to buy the hot-swap bays I would have preferred. I may
retrofit later.
Your suggestions are well taken, but the hardware I have doesn't readily
allow my use of them. Thanks for the ideas.
Chuck