On 02/02/2011 09:00 AM, Lamar Owen wrote:
On Wednesday, February 02, 2011 02:06:15 am Chuck Munro wrote:
The real key is to carefully label each SATA cable and its associated drive. Then the little mapping script can be used to identify the faulty drive which mdadm reports by its device name. It just occurred to me that whenever mdadm sends an email report, it can also run a script which groks out the path info and puts it in the email message. Problem solved:-)
Ok, perhaps I'm dense, but, if this is not a hot-swap bay you're talking about, wouldn't it be easier to have the drive's serial number (or other identifier found on the label) pulled into the e-mail, and compare with the label physically found on the drive, since you're going to have to open the case anyway? Using something like:
hdparm -I $DEVICE | grep Serial.Number
works here (the regexp Serial.Number matches the string "Serial Number" without requiring the double quotes....). Use whatever $DEVICE you need to use, as long as it's on a controller compatible with hdparm usage.
I have seen cases with a different Linux distribution where the actual module load order was nondeterministic (modules loaded in parallel); while upstream and the CentOS rebuild try to make things more deterministic, wouldn't it be safer to get a really unique, externally visible identifier from the drive? If the drive has failed to the degree that it won't respond to the query, then query all the good drives in the array for their serial numbers, and use a process of elimination. This, IMO, is more robust than relying on the drive detect order to remain deterministic.
If in a hotswap or coldswap bay, do some data access to the array, and see which LED's don't blink; that should correspond to the failed drive. If the bay has secondary LED's, you might be able to blink those, too.
Well no, you're not being dense. It's a case of making the best of what the physical hardware can do for me. In my case, the drives are segregated into several 3-drive bays which are bolted into the case individually, so removing each one to compare serial numbers would be a major pain, since I'd have to unbolt a bay and remove each drive one at a time to read the label.
The use of the new RHEL-6/CentOS-6 'udevadm' command nicely maps out the hardware path no matter the order the drives are detected/named, and since hardware paths are fixed, I just have to attach a little tag to each SATA cable with that path number on it. One thing I did was reboot the machine *many* times to make sure the controller cards were always enumerated by Linux in the same slot order.
I also notice that the RHEL-6 DriveInfo GUI application shows which drive is giving trouble, but it only maps the controllers in a vague way with respect to the hardware path. (At least that's what I remember seeing a couple of days ago, I could be mistaken.)
On this particular machine I don't have the luxury of per-drive LED activity indicators, so whacking each drive with a big read won't point the way (but I have used that technique on other machines). I didn't have the funds to buy the hot-swap bays I would have preferred. I may retrofit later.
Your suggestions are well taken, but the hardware I have doesn't readily allow my use of them. Thanks for the ideas.
Chuck