I thought I'd test replacing a failed drive in a 4 drive raid 10 array on a CentOS 5.2 box before it goes online and before a drive really fails.
I 'mdadm failed, removed', powered off, replaced drive, partitioned with sfdisk -d /dev/sda | sfdisk /dev/sdb, and finally 'mdadm add'ed'.
Everything seems fine until I try to create a snapshot lv. (Creating a snapshot lv worked before I replaced the drive.) Here's what I'm seeing.
# lvcreate -p r -s -L 8G -n home-snapshot /dev/vg0/homelv Couldn't find device with uuid 'yIIGF9-9f61-QPk8-q6q1-wn4D-iE1x-MJIMgi'. Couldn't find all physical volumes for volume group vg0. Volume group for uuid not found: I4Gf5TUB1M1TfHxZNg9cCkM1SbRo8cthCTTjVHBEHeCniUIQ03Ov4V1iOy2ciJwm Aborting. Failed to activate snapshot exception store.
So then I try
# pvdisplay --- Physical volume --- PV Name /dev/md3 VG Name vg0 PV Size 903.97 GB / not usable 3.00 MB Allocatable yes PE Size (KByte) 4096 Total PE 231416 Free PE 44536 Allocated PE 186880 PV UUID yIIGF9-9f61-QPk8-q6q1-wn4D-iE1x-MJIMgi
Subsequent runs of pvdisplay eventually returns nothing. pvck /dev/md3 seems to restore that but creating a snapshot volume still fails.
It's as if the "PV stuff" is not on the new drive. I (probably incorrectly) assumed that just adding the drive back in to the raid array would take care of that.
I've searched quite a bit but have not found any clues. Any one?
-- Thanks, Mike
Mike wrote:
I thought I'd test replacing a failed drive in a 4 drive raid 10 array on a CentOS 5.2 box before it goes online and before a drive really fails.
I 'mdadm failed, removed', powered off, replaced drive, partitioned with sfdisk -d /dev/sda | sfdisk /dev/sdb, and finally 'mdadm add'ed'.
Everything seems fine until I try to create a snapshot lv. (Creating a snapshot lv worked before I replaced the drive.) Here's what I'm seeing.
# lvcreate -p r -s -L 8G -n home-snapshot /dev/vg0/homelv Couldn't find device with uuid 'yIIGF9-9f61-QPk8-q6q1-wn4D-iE1x-MJIMgi'. Couldn't find all physical volumes for volume group vg0. Volume group for uuid not found: I4Gf5TUB1M1TfHxZNg9cCkM1SbRo8cthCTTjVHBEHeCniUIQ03Ov4V1iOy2ciJwm Aborting. Failed to activate snapshot exception store.
So then I try
# pvdisplay --- Physical volume --- PV Name /dev/md3 VG Name vg0 PV Size 903.97 GB / not usable 3.00 MB Allocatable yes PE Size (KByte) 4096 Total PE 231416 Free PE 44536 Allocated PE 186880 PV UUID yIIGF9-9f61-QPk8-q6q1-wn4D-iE1x-MJIMgi
Subsequent runs of pvdisplay eventually returns nothing. pvck /dev/md3 seems to restore that but creating a snapshot volume still fails.
It's as if the "PV stuff" is not on the new drive. I (probably incorrectly) assumed that just adding the drive back in to the raid array would take care of that.
I've searched quite a bit but have not found any clues. Any one?
It would be interesting to see what the mdadm --detail /dev/mdX says.
I see the VG is made out of 1 PV md3? What are md0,1,2 doing, I can guess md0 is probably /boot, but what about 1 and 2?
It wouldn't hurt to give the sfdisk partition dumps for the drives in question too.
-Ross
______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.
On Thu, 17 Jul 2008, Ross S. W. Walker wrote:
It would be interesting to see what the mdadm --detail /dev/mdX says.
I see the VG is made out of 1 PV md3? What are md0,1,2 doing, I can guess md0 is probably /boot, but what about 1 and 2?
It wouldn't hurt to give the sfdisk partition dumps for the drives in question too.
-Ross
Thanks for the reply. md2 is /boot, md0 is /root and md1 is swap.
# mdadm --detail /dev/md3 /dev/md3: Version : 00.90.03 Creation Time : Fri Jul 4 17:11:30 2008 Raid Level : raid10 Array Size : 947883008 (903.97 GiB 970.63 GB) Used Dev Size : 473941504 (451.99 GiB 485.32 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 3 Persistence : Superblock is persistent
Update Time : Thu Jul 17 15:58:52 2008 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0
Layout : near=1, far=2 Chunk Size : 256K
UUID : 7ecb1de6:c6e22a3a:1bd5446a:1dcd5444 Events : 0.3852
Number Major Minor RaidDevice State 0 8 4 0 active sync /dev/sda4 1 8 20 1 active sync /dev/sdb4 2 8 36 2 active sync /dev/sdc4 3 8 52 3 active sync /dev/sdd4
# sfdisk -l /dev/sda
Disk /dev/sda: 60801 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System /dev/sda1 * 0+ 12 13- 104391 fd Linux raid autodetect /dev/sda2 13 1287 1275 10241437+ fd Linux raid autodetect /dev/sda3 1288 1797 510 4096575 fd Linux raid autodetect /dev/sda4 1798 60800 59003 473941597+ fd Linux raid autodetect
# sfdisk -l /dev/sdb
Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System /dev/sdb1 * 0+ 12 13- 104391 fd Linux raid autodetect /dev/sdb2 13 1287 1275 10241437+ fd Linux raid autodetect /dev/sdb3 1288 1797 510 4096575 fd Linux raid autodetect /dev/sdb4 1798 60800 59003 473941597+ fd Linux raid autodetect
# sfdisk -l /dev/sdc
Disk /dev/sdc: 60801 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System /dev/sdc1 * 0+ 12 13- 104391 fd Linux raid autodetect /dev/sdc2 13 1287 1275 10241437+ fd Linux raid autodetect /dev/sdc3 1288 1797 510 4096575 fd Linux raid autodetect /dev/sdc4 1798 60800 59003 473941597+ fd Linux raid autodetect
# sfdisk -l /dev/sdd
Disk /dev/sdd: 60801 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start End #cyls #blocks Id System /dev/sdd1 * 0+ 12 13- 104391 fd Linux raid autodetect /dev/sdd2 13 1287 1275 10241437+ fd Linux raid autodetect /dev/sdd3 1288 1797 510 4096575 fd Linux raid autodetect /dev/sdd4 1798 60800 59003 473941597+ fd Linux raid autodetect
Just for the record I'm about 98.7% sure that the root problem here was that the LVM stuff (pvcreate, vgcreate, lvcreate) was done when booted from systemrescuecd and had nothing to do with replacing a failed drive.
The ouptut from 'pvcreate --version' on the systemrescuecd is: LVM version: 2.02.33 (2008-01-31) Library version: 1.02.26 (2008-06-06) Driver version: 4.13.0
And when booted from CentOS 5.2: LVM version: 2.02.32-RHEL5 (2008-03-04) Library version: 1.02.24 (2007-12-20) Driver version: 4.11.5
When [pv|vg|lv]create is done like it should have been (after booting CentOS) snapshot volume creation works as expected even after replacing a failed drive.
On Thu, 17 Jul 2008, Mike wrote:
I thought I'd test replacing a failed drive in a 4 drive raid 10 array on a CentOS 5.2 box before it goes online and before a drive really fails.
I 'mdadm failed, removed', powered off, replaced drive, partitioned with sfdisk -d /dev/sda | sfdisk /dev/sdb, and finally 'mdadm add'ed'.
Everything seems fine until I try to create a snapshot lv. (Creating a snapshot lv worked before I replaced the drive.) Here's what I'm seeing.
# lvcreate -p r -s -L 8G -n home-snapshot /dev/vg0/homelv Couldn't find device with uuid 'yIIGF9-9f61-QPk8-q6q1-wn4D-iE1x-MJIMgi'. Couldn't find all physical volumes for volume group vg0. Volume group for uuid not found: I4Gf5TUB1M1TfHxZNg9cCkM1SbRo8cthCTTjVHBEHeCniUIQ03Ov4V1iOy2ciJwm Aborting. Failed to activate snapshot exception store.