[CentOS] OT: clear error from premature disk removal from LVM

Tue Jun 14 22:59:32 UTC 2011
Keith Keller <kkeller at wombat.san-francisco.ca.us>

Hi all,

This question is only slightly ontopic, in that the host system is a
CentOS 5 box, but is otherwise a fairly generic question.  I've looked
in so many different places for an answer, but haven't had much luck,
so I'm hoping someone here has some advice.  (for the record: I posted 
this question to comp.os.linux.misc a few weeks ago)

===

I made a minor mistake recently, and am trying to determine the cleanest
way to clear it.  Attached to my hardware RAID controller were two RAID6 
units.  One was the original disks that came with the server, which
originally hosted our data, but had since had all its data migrated
to a new LVM PV.  The other is newer disks with a new PV, VG, and LV.

For testing purposes, after completing the pvmove I created a clean,
new LV on the unit containing the old disks.  (I believe that I actually
destroyed the original RAID6 array and created a new one.)  Unfortunately,
I completely forgot about this LV during a recent hardware upgrade,
and didn't run through the LVM steps to completely remove an unused
physical volume before removing these old disks.  (At least the LV was
not mounted at the time).  Now LVM seems to be confused about the disks
and volumes that are available, and the kernel may also be confused.

Taken from a recent reboot, here's the dmesg log entry for the old testing
unit:

sd 1:0:2:0: Attached scsi disk sdc

Later on, I removed the disks through the controller's tools, replaced
them with new ones (so this is the third set of disks), and created a
new RAID6 unit.  After making the new unit I see (among other messages
which don't seem helpful):

May 19 16:27:28 xxxx kernel: sd 1:0:2:0: Attached scsi disk sdd

Later on, when trying to run parted or friends, I see messages like so:

May 23 11:22:47 xxxx kernel: scsi 1:0:2:0: rejecting I/O to dead device

(note the same scsi address as sdc?)

And pvdisplay says:

# pvdisplay 
  /dev/testVG/testLV: read failed after 0 of 4096 at 0: Input/output error

That looks bad--it seems like testLV is trying to find the old LVM on
/dev/sdc, but the array on sdd is there instead.  Worse, I fear that
trying to manipulate sdd may cause problems in the future, so am wary of
doing anything with it before I clear up this issue.

What should my next steps be?  I've seen recommendations (in other
somewhat similar situations) to try an lvscan or vgscan to refresh the
list of volumes; or to try vgreduce --removemissing to remove the
now-gone volumes from what LVM thinks is there.  I do also still have
the original disks, which I could put back and try to get LVM to
re-find, but would be a pain.  I am hoping to avoid a reboot, since the
current LV seems unaffected and is in use, but I can do it if it's the
surest way to make sure my LVM configuration is clean.

I should point out that I do not care at all about the data on testLV;
all I want to do is cleanly tell LVM not to worry about that volume any
more, and be in a position to use the third set of disks safely and
reliably.

Thanks for reading--I hope you can help.

--keith

-- 
kkeller at wombat.san-francisco.ca.us

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.centos.org/pipermail/centos/attachments/20110614/1958bc5a/attachment-0004.sig>