[CentOS] mdadm: hot remove failed for /dev/sdg: Device or resource busy

Mon Feb 11 05:34:31 UTC 2013
Keith Keller <kkeller at wombat.san-francisco.ca.us>

Hi Vincent,

On 2013-02-11, Vincent Li <ruconse at gmail.com> wrote:
> Hi Keith,
>
> It seems that the mdadm -D indicates the root cause of "device busy":
>
> >5 8 96 5 faulty spare rebuilding /dev/sdg

Well, this is one thing I don't quite get.  In the past, when a device
has been marked faulty (even on this array), md has permitted me to
remove it.  These occasions were not during a reshape, however.  Naively
I would think that md would give up IO on a failed device, and so it
would no longer be busy.  And the dmesg report implies that md thinks
the device is still "active" even though it marked it faulty.

> Is there any clue in /proc/mdstat and /var/log/messages?

Not really.  Here's mdstat:

Personalities : [raid6] [raid5] [raid4] 
md127 : active raid6 sdm[13](S) sdg[5](F) sdj[8] sdi[7] sdk[10] sdc[1] sdn[12] sdd[2] sde[3] sdf[4] sdh[6] sdb[0] sdl[11]
      17578013184 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/11] [UUUUU_UUUUUU]
      	resync=PENDING
      
unused devices: <none>

As you may expect, sdg is set as faulty, and sdm is marked as a spare;
in the past, if things were nice, sdg would be removed automatically and
a new rebuild would start with sdm.

There isn't anything compelling in messages, either.  The only items I
see that seem relevant are errors when tools like mdadm -E /dev/sdg
reports read errors.  This is in fact what led me to look at udevadm
info; I expected that mdadm -E would not find anything on sdg, because I
thought that sdg no longer existed at all.  That's when I found sdg in
this limbo state.  Though oddly enough, udevinfo has changed:

# udevadm info --name=sdg --query=all
P: /devices/pci0000:00/0000:00:0b.0/0000:01:03.0/host2/target2:0:5/2:0:5:0/block/sdg
N: sdg
W: 102
S: block/8:96
S: disk/by-path/pci-0000:01:03.0-scsi-0:0:5:0
E: UDEV_LOG=3
E: DEVPATH=/devices/pci0000:00/0000:00:0b.0/0000:01:03.0/host2/target2:0:5/2:0:5:0/block/sdg
E: MAJOR=8
E: MINOR=96
E: DEVNAME=/dev/sdg
E: DEVTYPE=disk
E: SUBSYSTEM=block
E: MPATH_SBIN_PATH=/sbin
E: ID_SCSI=1
E: ID_TYPE=generic
E: ID_BUS=scsi
E: ID_PATH=pci-0000:01:03.0-scsi-0:0:5:0
E: LVM_SBIN_PATH=/sbin
E: DEVLINKS=/dev/block/8:96 /dev/disk/by-path/pci-0000:01:03.0-scsi-0:0:5:0

It no longer thinks there is any connection to the mdraid or the
controller, but it's still different from what I'd expect if there were
no udev entries for the device at all.

--keith


-- 
kkeller at wombat.san-francisco.ca.us