Dear All,
I am in desperate need for LVM data rescue for my server. I have an VG call vg_hosting consisting of 4 PVs each contained in a separate hard drive (/dev/sda1, /dev/sdb1, /dev/sdc1, and /dev/sdd1). And this LV: lv_home was created to use all the space of the 4 PVs.
Right now, the third hard drive is damaged; and therefore the third PV (/dev/sdc1) cannot be accessed anymore. I would like to recover whatever left in the other 3 PVs (/dev/sda1, /dev/sdb1, and /dev/sdd1).
I have tried with the following:
1. Removing the broken PV:
# vgreduce --force vg_hosting /dev/sdc1 Physical volume "/dev/sdc1" still in use
# pvmove /dev/sdc1 No extents available for allocation
2. Replacing the broken PV:
I was able to create a new PV and restore the VG Config/meta data:
# pvcreate --restorefile ... --uuid ... /dev/sdc1 # vgcfgrestore --file ... vg_hosting
However, vgchange would give this error:
# vgchange -a y device-mapper: resume ioctl on failed: Invalid argument Unable to resume vg_hosting-lv_home (253:4) 0 logical volume(s) in volume group "vg_hosting" now active
Could someone help me please??? I'm in dire need for help to save the data, at least some of it if possible.
Regards, Khem
On 2/27/2015 4:25 PM, Khemara Lyn wrote:
Right now, the third hard drive is damaged; and therefore the third PV (/dev/sdc1) cannot be accessed anymore. I would like to recover whatever left in the other 3 PVs (/dev/sda1, /dev/sdb1, and /dev/sdd1).
your data is spread across all 4 drives, and you lost 25% of it. so only 3 out of 4 blocks of data still exist. good luck with recovery.
Thank you, John for your quick reply. That is what I hope. But how to do it? I cannot even activate the LV with the remaining PVs.
Thanks, Khem
On Sat, February 28, 2015 7:34 am, John R Pierce wrote:
On 2/27/2015 4:25 PM, Khemara Lyn wrote:
Right now, the third hard drive is damaged; and therefore the third PV (/dev/sdc1) cannot be accessed anymore. I would like to recover whatever left in the other 3 PVs (/dev/sda1, /dev/sdb1, and /dev/sdd1).
your data is spread across all 4 drives, and you lost 25% of it. so only 3 out of 4 blocks of data still exist. good luck with recovery.
-- john r pierce 37N 122W somewhere on the middle of the left coast
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
----- Original Message ----- | Dear All, | | I am in desperate need for LVM data rescue for my server. | I have an VG call vg_hosting consisting of 4 PVs each contained in a | separate hard drive (/dev/sda1, /dev/sdb1, /dev/sdc1, and /dev/sdd1). | And this LV: lv_home was created to use all the space of the 4 PVs. | | Right now, the third hard drive is damaged; and therefore the third PV | (/dev/sdc1) cannot be accessed anymore. I would like to recover whatever | left in the other 3 PVs (/dev/sda1, /dev/sdb1, and /dev/sdd1). | | I have tried with the following: | | 1. Removing the broken PV: | | # vgreduce --force vg_hosting /dev/sdc1 | Physical volume "/dev/sdc1" still in use | | # pvmove /dev/sdc1 | No extents available for allocation
This would indicate that you don't have sufficient extents to move the data off of this disk. If you have another disk then you could try adding it to the VG and then moving the extents.
| 2. Replacing the broken PV: | | I was able to create a new PV and restore the VG Config/meta data: | | # pvcreate --restorefile ... --uuid ... /dev/sdc1 | # vgcfgrestore --file ... vg_hosting | | However, vgchange would give this error: | | # vgchange -a y | device-mapper: resume ioctl on failed: Invalid argument | Unable to resume vg_hosting-lv_home (253:4) | 0 logical volume(s) in volume group "vg_hosting" now active
There should be no need to create a PV and then restore the VG unless the entire VG is damaged. The configuration should still be available on the other disks and adding the new PV and moving the extents should be enough.
| Could someone help me please??? | I'm in dire need for help to save the data, at least some of it if possible.
Can you not see the PV/VG/LV at all?
Dear James,
Thank you for being quick to help. Yes, I could see all of them:
# vgs # lvs # pvs
Regards, Khem
On Sat, February 28, 2015 7:37 am, James A. Peltier wrote:
----- Original Message ----- | Dear All, | | I am in desperate need for LVM data rescue for my server. | I have an VG call vg_hosting consisting of 4 PVs each contained in a | separate hard drive (/dev/sda1, /dev/sdb1, /dev/sdc1, and /dev/sdd1). | And this LV: lv_home was created to use all the space of the 4 PVs. | | Right now, the third hard drive is damaged; and therefore the third PV | (/dev/sdc1) cannot be accessed anymore. I would like to recover whatever | left in the other 3 PVs (/dev/sda1, /dev/sdb1, and /dev/sdd1). | | I have tried with the following: | | 1. Removing the broken PV: | | # vgreduce --force vg_hosting /dev/sdc1 | Physical volume "/dev/sdc1" still in use | | # pvmove /dev/sdc1 | No extents available for allocation
This would indicate that you don't have sufficient extents to move the data off of this disk. If you have another disk then you could try adding it to the VG and then moving the extents.
| 2. Replacing the broken PV: | | I was able to create a new PV and restore the VG Config/meta data: | | # pvcreate --restorefile ... --uuid ... /dev/sdc1 | # vgcfgrestore --file ... vg_hosting | | However, vgchange would give this error: | | # vgchange -a y | device-mapper: resume ioctl on failed: Invalid argument | Unable to resume vg_hosting-lv_home (253:4) | 0 logical volume(s) in volume group "vg_hosting" now active
There should be no need to create a PV and then restore the VG unless the entire VG is damaged. The configuration should still be available on the other disks and adding the new PV and moving the extents should be enough.
| Could someone help me please??? | I'm in dire need for help to save the data, at least some of it if possible.
Can you not see the PV/VG/LV at all?
-- James A. Peltier IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpeltier@sfu.ca Website : http://www.sfu.ca/itservices Twitter : @sfu_rcg Powering Engagement Through Technology "Build upon strengths and weaknesses will generally take care of themselves" - Joyce C. Lock
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hello James and All,
For your information, here's the listing looks like:
[root@localhost ~]# pvs PV VG Fmt Attr PSize PFree /dev/sda1 vg_hosting lvm2 a-- 1.82t 0 /dev/sdb2 vg_hosting lvm2 a-- 1.82t 0 /dev/sdc1 vg_hosting lvm2 a-- 1.82t 0 /dev/sdd1 vg_hosting lvm2 a-- 1.82t 0 [root@localhost ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv_home vg_hosting -wi-s----- 7.22t lv_root vg_hosting -wi-a----- 50.00g lv_swap vg_hosting -wi-a----- 11.80g [root@localhost ~]# vgs VG #PV #LV #SN Attr VSize VFree vg_hosting 4 3 0 wz--n- 7.28t 0 [root@localhost ~]#
The problem is, when I do:
[root@localhost ~]# vgchange -a y device-mapper: resume ioctl on failed: Invalid argument Unable to resume vg_hosting-lv_home (253:4) 3 logical volume(s) in volume group "vg_hosting" now active
Only lv_root and lv_swap are activated; but lv_home is not, with the error above (on the vgchange command).
How to activate the lv_home even with the 3 PVs left? The PV /dev/sdb2 is the one lost. I created it from a new blank hard disk and restore the VG using:
# pvcreate --restorefile ... --uuid ... /dev/sdb2 # vgcfgrestore --file ... vg_hosting
Regards, Khem
On Sat, February 28, 2015 7:42 am, Khemara Lyn wrote:
Dear James,
Thank you for being quick to help. Yes, I could see all of them:
# vgs # lvs # pvs
Regards, Khem
On Sat, February 28, 2015 7:37 am, James A. Peltier wrote:
----- Original Message ----- | Dear All, | | I am in desperate need for LVM data rescue for my server. | I have an VG call vg_hosting consisting of 4 PVs each contained in a | separate hard drive (/dev/sda1, /dev/sdb1, /dev/sdc1, and /dev/sdd1). | And this LV: lv_home was created to use all the space of the 4 PVs. | | Right now, the third hard drive is damaged; and therefore the third PV | (/dev/sdc1) cannot be accessed anymore. I would like to recover whatever | left in the other 3 PVs (/dev/sda1, /dev/sdb1, and /dev/sdd1). | | I have tried with the following: | | 1. Removing the broken PV: | | # vgreduce --force vg_hosting /dev/sdc1 | Physical volume "/dev/sdc1" still in use | | # pvmove /dev/sdc1 | No extents available for allocation
This would indicate that you don't have sufficient extents to move the data off of this disk. If you have another disk then you could try adding it to the VG and then moving the extents.
| 2. Replacing the broken PV: | | I was able to create a new PV and restore the VG Config/meta data: | | # pvcreate --restorefile ... --uuid ... /dev/sdc1 | # vgcfgrestore --file ... vg_hosting | | However, vgchange would give this error: | | # vgchange -a y | device-mapper: resume ioctl on failed: Invalid argument | Unable to resume vg_hosting-lv_home (253:4) | 0 logical volume(s) in volume group "vg_hosting" now active
There should be no need to create a PV and then restore the VG unless the entire VG is damaged. The configuration should still be available on the other disks and adding the new PV and moving the extents should be enough.
| Could someone help me please??? | I'm in dire need for help to save the data, at least some of it if possible.
Can you not see the PV/VG/LV at all?
-- James A. Peltier IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpeltier@sfu.ca Website : http://www.sfu.ca/itservices Twitter : @sfu_rcg Powering Engagement Through Technology "Build upon strengths and weaknesses will generally take care of themselves" - Joyce C. Lock
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 2/27/2015 4:37 PM, James A. Peltier wrote:
| I was able to create a new PV and restore the VG Config/meta data: | | # pvcreate --restorefile ... --uuid ... /dev/sdc1 |
oh, that step means you won't be able to recover ANY of the data that was formerly on that PV.
Dear John,
I understand; I tried it in the hope that, I could activate the LV again with a new PV replacing the damaged one. But still I could not activate it.
What is the right way to recover the remaining PVs left?
Regards, Khem
On Sat, February 28, 2015 7:42 am, John R Pierce wrote:
On 2/27/2015 4:37 PM, James A. Peltier wrote:
| I was able to create a new PV and restore the VG Config/meta data: | | # pvcreate --restorefile ... --uuid ... /dev/sdc1 |
oh, that step means you won't be able to recover ANY of the data that was formerly on that PV.
-- john r pierce 37N 122W somewhere on the middle of the left coast
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 2/27/2015 4:52 PM, Khemara Lyn wrote:
I understand; I tried it in the hope that, I could activate the LV again with a new PV replacing the damaged one. But still I could not activate it.
What is the right way to recover the remaining PVs left?
take a filing cabinet packed full of 10s of 1000s of files of 100s of pages each, with the index cards interleaved in the files, and remove 1/4th of the pages in the folders, including some of the indexes... and toss everything else on the floor... this is what you have. 3 out of 4 pages, semi-randomly with no idea whats what.
a LV built from PV's that are just simple drives is something like RAID0, which isn't RAID at all, as there's no redundancy, its AID-0.
On Fri, 27 Feb 2015 19:24:57 -0800 John R Pierce pierce@hogranch.com wrote:
On 2/27/2015 4:52 PM, Khemara Lyn wrote:
What is the right way to recover the remaining PVs left?
take a filing cabinet packed full of 10s of 1000s of files of 100s of pages each, with the index cards interleaved in the files, and remove 1/4th of the pages in the folders, including some of the indexes... and toss everything else on the floor... this is what you have. 3 out of 4 pages, semi-randomly with no idea whats what.
And this is why I don't like LVM to begin with. If one of the drives dies, you're screwed not only for the data on that drive, but even for data on remaining healthy drives.
I never really saw the point of LVM. Storing data on plain physical partitions, having an intelligent directory structure and a few wise well-placed symlinks across the drives can go a long way in having flexible storage, which is way more robust than LVM. With today's huge drive capacities, I really see no reason to adjust the sizes of partitions on-the-fly, and putting several TB of data in a single directory is just Bad Design to begin with.
That said, if you have a multi-TB amount of critical data while not having at least a simple RAID-1 backup, you are already standing in a big pile of sh*t just waiting to become obvious, regardless of LVM and stuff. Hardware fails, and storing data without a backup is just simply a disaster waiting to happen.
Best, :-) Marko
On 2/27/2015 8:00 PM, Marko Vojinovic wrote:
And this is why I don't like LVM to begin with. If one of the drives dies, you're screwed not only for the data on that drive, but even for data on remaining healthy drives.
with classic LVM, you were supposed to use raid for your PV's. The new LVM in 6.3+ has integrated raid at an LV level, you just have to declare all your LVs with appropriate raid levels.
On Fri, Feb 27, 2015 at 9:44 PM, John R Pierce pierce@hogranch.com wrote:
On 2/27/2015 8:00 PM, Marko Vojinovic wrote:
And this is why I don't like LVM to begin with. If one of the drives dies, you're screwed not only for the data on that drive, but even for data on remaining healthy drives.
with classic LVM, you were supposed to use raid for your PV's. The new LVM in 6.3+ has integrated raid at an LV level, you just have to declare all your LVs with appropriate raid levels.
I think since inception of LVM2, type mirror has been available which is now legacy (but still available). The current type since CentOS 6.3 is raid1. But yes for anything raid4+ you previously had to create it with mdadm or use hardware RAID (which of course you can still do, most people still prefer managing software raid with mdadm than lvm's tools).
And then Btrfs (no LVM). mkfs.btrfs -d single /dev/sd[bcde] mount /dev/sdb /mnt/bigbtr cp -a /usr /mnt/bigbtr
Unmount. Poweroff. Kill 3rd of 4 drives. Poweron.
mount -o degraded,ro /dev/sdb /mnt/bigbtr ## degraded,ro is required or mount fails cp -a /mnt/bigbtr/usr/ /mnt/btrfs ## copy to a different volume
No dmesg errors. Bunch of I/O errors only when it was trying to copy data on the 3rd drive. But it continues.
# du -sh /mnt/btrfs/usr 2.5G usr
Exactly 1GB was on the missing drive. So I recovered everything that wasn't on that drive.
One gotcha that applies to all three fs's that I'm not testing: in-use drive failure. I'm simulate drive failure by first cleanly unmounting and powering off. Super ideal. How the file system and anything underneath it (LVM and maybe RAID) handles drive failures while in use, is a huge factor.
Chris Murphy
On Fri, Feb 27, 2015 at 9:00 PM, Marko Vojinovic vvmarko@gmail.com wrote:
And this is why I don't like LVM to begin with. If one of the drives dies, you're screwed not only for the data on that drive, but even for data on remaining healthy drives.
It has its uses, just like RAID0 has uses. But yes, as the number of drives in the pool increases, the risk of catastrophic failure increases. So you have to bet on consistent backups and be OK with any intervening dataloss. If not, well, use RAID1+ or use a distributed-replication cluster like GlusterFS or Ceph.
Hardware fails, and storing data without a backup is just simply a disaster waiting to happen.
I agree. I kind get a wee bit aggressive and say, if you don't have backups the data is by (your own) definition not important.
Anyway, changing the underlying storage as little as possible gives the best chance of success. linux-raid@ list is full of raid5/6 implosions due to people panicking, reading a bunch of stuff, not identifying their actual problem, and just start typing a bunch of commands and end up with user induced data loss.
In the case of this thread, I'd say the best chance for success is to not remove or replace the dead PV, but to do a partial activation. # vgchange -a y --activationmode partial
And then ext4 it's a scrape operation with debugfs -c. And for XFS looks like some amount of data is possibly recoverable with just an ro mount. I didn't try any scrape operation, too tedious to test.
On Fri, February 27, 2015 10:00 pm, Marko Vojinovic wrote:
On Fri, 27 Feb 2015 19:24:57 -0800 John R Pierce pierce@hogranch.com wrote:
On 2/27/2015 4:52 PM, Khemara Lyn wrote:
What is the right way to recover the remaining PVs left?
take a filing cabinet packed full of 10s of 1000s of files of 100s of pages each, with the index cards interleaved in the files, and remove 1/4th of the pages in the folders, including some of the indexes... and toss everything else on the floor... this is what you have. 3 out of 4 pages, semi-randomly with no idea whats what.
And this is why I don't like LVM to begin with. If one of the drives dies, you're screwed not only for the data on that drive, but even for data on remaining healthy drives.
I never really saw the point of LVM. Storing data on plain physical partitions, having an intelligent directory structure and a few wise well-placed symlinks across the drives can go a long way in having flexible storage, which is way more robust than LVM. With today's huge drive capacities, I really see no reason to adjust the sizes of partitions on-the-fly, and putting several TB of data in a single directory is just Bad Design to begin with.
That said, if you have a multi-TB amount of critical data while not having at least a simple RAID-1 backup, you are already standing in a big pile of sh*t just waiting to become obvious, regardless of LVM and stuff. Hardware fails, and storing data without a backup is just simply a disaster waiting to happen.
Indeed. That is why: no LVMs in my server room. Even no software RAID. Software RAID relies on the system itself to fulfill its RAID function; what if kernel panics before software RAID does its job? Hardware RAID (for huge filesystems I can not afford to back up) is what only makes sense for me. RAID controller has dedicated processors and dedicated simple system which does one simple task: RAID.
Just my $0.02
Valeri
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On Sat, Feb 28, 2015 at 1:26 PM, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
Indeed. That is why: no LVMs in my server room. Even no software RAID. Software RAID relies on the system itself to fulfill its RAID function; what if kernel panics before software RAID does its job? Hardware RAID (for huge filesystems I can not afford to back up) is what only makes sense for me. RAID controller has dedicated processors and dedicated simple system which does one simple task: RAID.
Biggest problem is myriad defaults aren't very well suited for multiple device configurations. There are a lot of knobs in Linux and on the drives and in hardware RAID cards. None of this is that simple.
Drives, and hardware RAID cards are subject to firmware bugs, just as we have software bugs in the kernel. We know firmware bugs cause corruption. Not all hardware RAID cards are the same, some are total junk. Many others get you vendor lock in due to proprietary metadata written to the drives. You can't get your data off if the card dies, you have to buy a similar model card sometimes with the same firmware version in order to regain access. Some cards support SNIA's DDF format, in which case there's a chance mdadm can assemble the array, should the hardware card die.
Anyway, the main thing is knowing where the land mines are regardless of what technology you pick. If you don't know where they are, you're inevitably going to run into trouble with anything you choose.
On Sat, February 28, 2015 4:22 pm, Chris Murphy wrote:
On Sat, Feb 28, 2015 at 1:26 PM, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
Indeed. That is why: no LVMs in my server room. Even no software RAID. Software RAID relies on the system itself to fulfill its RAID function; what if kernel panics before software RAID does its job? Hardware RAID (for huge filesystems I can not afford to back up) is what only makes sense for me. RAID controller has dedicated processors and dedicated simple system which does one simple task: RAID.
Biggest problem is myriad defaults aren't very well suited for multiple device configurations. There are a lot of knobs in Linux and on the drives and in hardware RAID cards. None of this is that simple.
Drives, and hardware RAID cards are subject to firmware bugs, just as we have software bugs in the kernel. We know firmware bugs cause corruption.
Speaking of which: Only good hardware cards are the ones I would use, and only good external RAID boxes. Over last decade and a half I never had trouble due to firmware bugs of RAIDs. What I use is:
1. 3ware (mostly) 2. LSI megaraid (a few, I don't like their user interface and poor notification abilities) 3. Areca (also a few, better UI than that of LSI)
External RAID boxes: Infortrend
I never will go for cheepy fake RAID (adaptec is one off the top of my head). Also, it was not my choice but I had to deal with Hm... not good external RAID boxes: by Promise, and by Raid.com to mention two.
You are implying that firmware of hardware RAID cards is somehow buggier than software of software RAID plus Linux kernel (sorry if I misinterpreted your point). I disagree: embedded system of RAID card and RAID function they have to fulfill are much simpler than everything involved into software RAID. Therefore, with the same effort invested, firmware of (good) hardware is less buggy. And again, Linux kernel can be panicked more likely than trivial embedded system of hardware RAID card/box. At least my experience over decade and a half confirms that.
I have heard horror stories from people who used the same good hardware I mentioned (3ware). However, when I went in each case deep into detail I discovered that they just didn't have all necessary set up correctly, which it trivial as a matter of fact. Namely: common mistake in all cases was: not setting RAID verify cron task (it is set on the RAID configuration level). I have my raids verified once a week. If you don't verify them for a year, what happens then: you don't discover individual drive degradation until it is too late and larger number than the level of redundancy are kicked out because of fatal failures. Even then 3ware when it is already not redundant doesn't kick out newly failing drives, just makes RAID read-only, so you still can salvage something. Anyway, these horror stories were purely poor sysadmin's job IMHO.
Not all hardware RAID cards are the same, some are total junk. Many others get you vendor lock in due to proprietary metadata written to the drives. You can't get your data off if the card dies, you have to buy a similar model card sometimes with the same firmware version in order to regain access.
I would not consider that a disadvantage. I still have to see a 3ware card dead (yes, you can burn that if you plug it into slot with gross misalignment like tilt). And with 3ware, later model will accept drives originally making up RAID on older model, only it will make RAID read only, thus you can salvage your data first, then you can re-create RAID with this new card's (metadata standard). I guess, I may have different philosophy than you do. If I use RAID card, I choose indeed good one. Once I use the good one, I feel no need moving drives to card made by different manufacturer. And last, yet important thing: if you have to use these drives with different card (even just different model by the same manufacturer) then you better re-create RAID from scratch on this new card. If you value your data...
Just my $0.02
Valeri
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On Sat, Feb 28, 2015 at 4:29 PM, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
You are implying that firmware of hardware RAID cards is somehow buggier than software of software RAID plus Linux kernel (sorry if I misinterpreted your point).
"Drives, and hardware RAID cards are subject to firmware bugs, just as we have software bugs in the kernel." makes no assessment of how common such bugs are relative to each other.
I disagree: embedded system of RAID card and RAID function they have to fulfill are much simpler than everything involved into software RAID. Therefore, with the same effort invested, firmware of (good) hardware is less buggy.
There's no evidence provided for this. All I've stated is bugs happen in both software and the firmware on hardware RAID cards. http://www.cs.toronto.edu/~bianca/papers/fast08.pdf
And further there's a widespread misperception that RAID56 (whether software or hardware) is capable of detecting and correcting corruption.
And again, Linux kernel can be panicked more likely than trivial embedded system of hardware RAID card/box. At least my experience over decade and a half confirms that.
I'd say this is not a scientific sample and therefore unproven. I can provide my own non-scientific sample: an XServe running OS X with software raid1 which has never, in 8 years, kernel panicked. Its longest uptime was over 500 days, and was only rebooted due to a system upgrade that required it. There's nothing special about the XServe that makes this magic, it's just good hardware with ECC memory, enterprise SAS drives, and a capable though limited kernel. So there's no good reason to expect kernel panics. Having them means something is wrong.
I have my raids verified once a week. If you don't verify them for a year, what happens then: you don't discover individual drive degradation until it is too late and larger number than the level of redundancy are kicked out because of fatal failures.
This is a common problem on software and hardware RAID alike, the lack of scrubbing. Also recognize that software raid tends to bring along cheaper drives that aren't well suited for RAID use, whereas people spending money on hardware raid tend to invest in appropriate drives. That prevents problems due to proper SCT ERC settings on the drive.
Anyway, these horror stories were purely poor sysadmin's job IMHO.
I agree. This is common in any case.
----- Original Message ----- | On Fri, 27 Feb 2015 19:24:57 -0800 | John R Pierce pierce@hogranch.com wrote: | > On 2/27/2015 4:52 PM, Khemara Lyn wrote: | > > | > > What is the right way to recover the remaining PVs left? | > | > take a filing cabinet packed full of 10s of 1000s of files of 100s of | > pages each, with the index cards interleaved in the files, and | > remove 1/4th of the pages in the folders, including some of the | > indexes... and toss everything else on the floor... this is what | > you have. 3 out of 4 pages, semi-randomly with no idea whats what. | | And this is why I don't like LVM to begin with. If one of the drives | dies, you're screwed not only for the data on that drive, but even for | data on remaining healthy drives. | | I never really saw the point of LVM. Storing data on plain physical | partitions, having an intelligent directory structure and a few wise | well-placed symlinks across the drives can go a long way in having | flexible storage, which is way more robust than LVM. With today's huge | drive capacities, I really see no reason to adjust the sizes of | partitions on-the-fly, and putting several TB of data in a single | directory is just Bad Design to begin with. | | That said, if you have a multi-TB amount of critical data while not | having at least a simple RAID-1 backup, you are already standing in a | big pile of sh*t just waiting to become obvious, regardless of LVM and | stuff. Hardware fails, and storing data without a backup is just simply | a disaster waiting to happen. | | Best, :-) | Marko |
This is not an LVM vs physical partitioning problem. This is a system component failed and it wasn't being monitored and so now we're in deep doo-doo problem. This problem also came to us after there were many attempts to recover the problem that were likely done incorrectly. If the disk was still at least partially accessible (monitoring would have caught that) there would be increased chances of data recovery, although maybe not much better.
People who understand how to use the system do not suffer these problems. LVM adds a bit of complexity for a bit of extra benefits. You can't blame LVM for user error. Not having monitoring in place or backups is a user problem, not an LVM one.
I have managed Petabytes worth of data on LVM and not suffered this sort of problem *knock on wood*, but I also know that I'm not immune to it. I don't even use partitions for anything but system drives. I use whole disk PV to avoid things like partition alignment issues. Not a single bit of data loss in 7 years dealing with these servers either. At least none that weren't user error. ;)
On Sat, Feb 28, 2015 at 4:28 PM, James A. Peltier jpeltier@sfu.ca wrote:
People who understand how to use the system do not suffer these problems. LVM adds a bit of complexity for a bit of extra benefits. You can't blame LVM for user error. Not having monitoring in place or backups is a user problem, not an LVM one.
It's a good point. Suggesting the OP's problem is an example why LVM should not be used, is like saying dropped laptops is a good example why laptops shouldn't be used.
A fair criticism is whether LVM should be used by default with single disk system installations. I've always been suspicious of this choice. (But now, even Apple does this on OS X by default, possibly as a prelude to making full volume encryption a default - their "LVM" equivalent implements encryption as an LV level attribute called logical volume family.)
----- Original Message ----- | On Sat, Feb 28, 2015 at 4:28 PM, James A. Peltier jpeltier@sfu.ca wrote: | | > People who understand how to use the system do not suffer these problems. | > LVM adds a bit of complexity for a bit of extra benefits. You can't | > blame LVM for user error. Not having monitoring in place or backups is a | > user problem, not an LVM one. | | It's a good point. Suggesting the OP's problem is an example why LVM | should not be used, is like saying dropped laptops is a good example | why laptops shouldn't be used. | | A fair criticism is whether LVM should be used by default with single | disk system installations. I've always been suspicious of this choice. | (But now, even Apple does this on OS X by default, possibly as a | prelude to making full volume encryption a default - their "LVM" | equivalent implements encryption as an LV level attribute called | logical volume family.) | | -- | Chris Murphy
There is no difference between a single disk system and a multi-disk system in terms of being able to dynamically resize volumes that reside on a volume group. Having the ability to resize a volume to be either larger or smaller on demand is a really nice feature to have. Did you make / too small and have space on home and you're using ext3/4 then simply resize the home logical volume to be smaller and all the free extents to /. Pretty simple process really and it can be done online. This is just one example. There are others, but this has nothing to do with the OP.
Getting back to the OP, it would seem that you may be stuck in a position where you need to restore from backup. Without having further details into what exactly is happening I fear you're not going to be able to recover. I'd be available to talk off list if needed.
On Sat, Feb 28, 2015 at 5:59 PM, James A. Peltier jpeltier@sfu.ca wrote:
There is no difference between a single disk system and a multi-disk system in terms of being able to dynamically resize volumes that reside on a volume group. Having the ability to resize a volume to be either larger or smaller on demand is a really nice feature to have.
I'll better qualify this. For CentOS it's a fine default, as it is for Fedora Server. For Workstation and Cloud I think LVM overly complicates things. More non-enterprise users get confused over LVM than they ever have a need to resize volumes.
Did you make / too small and have space on home and you're using ext3/4 then simply resize the home logical volume to be smaller and all the free extents to /. Pretty simple process really and it can be done online.
XFS doesn't support shrink, only grow. XFS is the CentOS 7 default. The main advantage of LVM for CentOS system disks is ability to use pvmove to replace a drive online, rather than resize. If Btrfs stabilizes sufficiently for RHEL/CentOS 8, overall it's a win because it meets the simple need of mortal users and supports advanced features for advanced users. (Ergo I think LVM is badass but it's also the storage equivalent of emacs - managing it is completely crazy.)
This is just one example. There are others, but this has nothing to do with the OP.
Getting back to the OP, it would seem that you may be stuck in a position where you need to restore from backup. Without having further details into what exactly is happening I fear you're not going to be able to recover. I'd be available to talk off list if needed.
Yeah my bad for partly derailing this thread. Hopefully the original poster hasn't been scared off, not least of which may be due to my bark about cross posting being worse than my bite.
Dear Chris, James, Valeri and all,
Sorry to have not responded as I'm still on struggling with the recovery with no success.
I've been trying to set up a new system with the exact same scenario (4 2TB hard drives and remove the 3rd one afterwards). I still cannot recover.
We did have a backup system but it went bad for a while and we did not have replacement on time until this happened.
From all of your responses, it seems, recovery is almost impossible. I'm now trying to look at the hardware part and get the damaged hard drive to fixed.
I appreciate all you helps and still wait and listen to more suggestions.
Regards, Khem
On 03/01/2015 08:40 AM, Chris Murphy wrote:
On Sat, Feb 28, 2015 at 5:59 PM, James A. Peltier jpeltier@sfu.ca wrote:
There is no difference between a single disk system and a multi-disk system in terms of being able to dynamically resize volumes that reside on a volume group. Having the ability to resize a volume to be either larger or smaller on demand is a really nice feature to have.
I'll better qualify this. For CentOS it's a fine default, as it is for Fedora Server. For Workstation and Cloud I think LVM overly complicates things. More non-enterprise users get confused over LVM than they ever have a need to resize volumes.
Did you make / too small and have space on home and you're using ext3/4 then simply resize the home logical volume to be smaller and all the free extents to /. Pretty simple process really and it can be done online.
XFS doesn't support shrink, only grow. XFS is the CentOS 7 default. The main advantage of LVM for CentOS system disks is ability to use pvmove to replace a drive online, rather than resize. If Btrfs stabilizes sufficiently for RHEL/CentOS 8, overall it's a win because it meets the simple need of mortal users and supports advanced features for advanced users. (Ergo I think LVM is badass but it's also the storage equivalent of emacs - managing it is completely crazy.)
This is just one example. There are others, but this has nothing to do with the OP.
Getting back to the OP, it would seem that you may be stuck in a position where you need to restore from backup. Without having further details into what exactly is happening I fear you're not going to be able to recover. I'd be available to talk off list if needed.
Yeah my bad for partly derailing this thread. Hopefully the original poster hasn't been scared off, not least of which may be due to my bark about cross posting being worse than my bite.
On Sun, March 1, 2015 9:07 pm, Khemara Lin wrote:
Dear Chris, James, Valeri and all,
Sorry to have not responded as I'm still on struggling with the recovery with no success.
I've been trying to set up a new system with the exact same scenario (4 2TB hard drives and remove the 3rd one afterwards). I still cannot recover.
We did have a backup system but it went bad for a while and we did not have replacement on time until this happened.
From all of your responses, it seems, recovery is almost impossible. I'm now trying to look at the hardware part and get the damaged hard drive to fixed.
I appreciate all you helps and still wait and listen to more suggestions.
There may be a bit expensive route. Depending on how valuable the data are, you may think of contacting professional recovery services. They usually take about a Month, they are expensive. Decent ones will be on the order of $1000 if it is a single drive. Likely more if it is fatally failed RAID. You can do your research and find good ones close to you. The rule of thumb is: if they only charge in case of more or less successful recovery (and sometimes they can recover almost 100%, sometimes 70-80% sometimes nothing - then they will not charge you), then it probably is decent company. They live from results of their work. If they charge for "estimate" even if they tell later they can not recover, this is bad sign. They work with fine equipment to read stuff off the platters of died drives. They work on the level of debugging of filesystems (and RAIDs), so what they charge is usually not that much for the kind of work they do. If you don't feel you are that level of expert as they are, and the data is worth it, I would contact recovery services. I myself usually have good backup (knocking on wood), but I know several people who actually used some of these companies, and their data got recovered. If you come to the point of need some references, contact me off the list, I'll dig up my old emails, and will send you what people (whom I know in person) say about the companies they used successfully.
Valeri
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On Sun, Mar 1, 2015 at 9:03 PM, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
There may be a bit expensive route. Depending on how valuable the data are, you may think of contacting professional recovery services. They usually take about a Month, they are expensive. Decent ones will be on the order of $1000 if it is a single drive. Likely more if it is fatally failed RAID.
Actually, I'm probably wrong in the previous post about sending off the single bad PV for recovery. Your point above made me think, umm yeah no, pretty much any company specializing in data recovery will want the entire array/LV backing drives, even the good ones. Same for RAID, they probably don't want just the dead drive, they want the whole thing. And they charge by the total size. So, yeah probably a lot more than $1K.
If they charge for "estimate" even if they tell later they can not recover, this is bad sign.
Agreed.
On Sun, Mar 1, 2015 at 8:07 PM, Khemara Lin lin.kh@wicam.com.kh wrote:
Dear Chris, James, Valeri and all,
Sorry to have not responded as I'm still on struggling with the recovery with no success.
I've been trying to set up a new system with the exact same scenario (4 2TB hard drives and remove the 3rd one afterwards). I still cannot recover.
Well, it's effectively a raid0. While it's not block level striping, it's a linear allocation, the way ext4 and XFS write, you're going to get file extents and fs metadata strewn across all four drives. As soon as any one drive is removed the whole thing is sufficiently damaged it can't recover without a lot of work. Imagine a (really bad example physics wise) single drive scenario and magically punching a hole through a drive such that it'll still spin. The fs on that drive is doing to have all sorts of problems because of the hole, even if it can read 3/4 of the drive.
We did have a backup system but it went bad for a while and we did not have replacement on time until this happened.
From all of your responses, it seems, recovery is almost impossible. I'm now trying to look at the hardware part and get the damaged hard drive to fixed.
About the best case scenario with such a situation is literally do nothing with the LVM setup, and send that PV off for block level data recovery (you didn't say how it failed but I'm assuming it's beyond the ability to fix it locally). Then once the recovered replacement PV is back in the setup, things will just work again. *shrug* LVM linear isn't designed to be fail safe in the face of a single device failure.
On Fri, Feb 27, 2015 at 8:24 PM, John R Pierce pierce@hogranch.com wrote:
On 2/27/2015 4:52 PM, Khemara Lyn wrote:
I understand; I tried it in the hope that, I could activate the LV again with a new PV replacing the damaged one. But still I could not activate it.
What is the right way to recover the remaining PVs left?
take a filing cabinet packed full of 10s of 1000s of files of 100s of pages each, with the index cards interleaved in the files, and remove 1/4th of the pages in the folders, including some of the indexes... and toss everything else on the floor... this is what you have. 3 out of 4 pages, semi-randomly with no idea whats what.
a LV built from PV's that are just simple drives is something like RAID0, which isn't RAID at all, as there's no redundancy, its AID-0.
If the LE to PE relationship is exactly linear, as in, the PV, VG, LV were all made at the same time, it's not entirely hopeless. There will be some superblocks intact so scraping is possible.
I just tried this with a 4 disk LV and XFS. I removed the 3rd drive. I was able to activate the LV using:
vgchange -a y --activationmode partial
I was able to mount -o ro but I do get errors in dmesg: [ 1594.835766] XFS (dm-1): Mounting V4 Filesystem [ 1594.884172] XFS (dm-1): Ending clean mount [ 1602.753606] XFS (dm-1): metadata I/O error: block 0x5d780040 ("xfs_trans_read_buf_map") error 5 numblks 16 [ 1602.753623] XFS (dm-1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
# ls -l ls: cannot access 4: Input/output error total 0 drwxr-xr-x. 3 root root 16 Feb 27 20:40 1 drwxr-xr-x. 3 root root 16 Feb 27 20:43 2 drwxr-xr-x. 3 root root 16 Feb 27 20:47 3 ??????????? ? ? ? ? ? 4
# cp -a 1/ /mnt/btrfs cp: cannot stat ‘1/usr/include’: Input/output error cp: cannot stat ‘1/usr/lib/alsa/init’: Input/output error cp: cannot stat ‘1/usr/lib/cups’: Input/output error cp: cannot stat ‘1/usr/lib/debug’: Input/output error [...]
And now in dmesg, thousands of [ 1663.722490] XFS (dm-1): metadata I/O error: block 0x425f96d0 ("xfs_trans_read_buf_map") error 5 numblks 8
Out of what should have been 3.5GB of data in 1/, I was able to get 452MB.
That's not so bad for just a normal mount and copy. I am in fact shocked the file system mounts, and stays mounted. Yay XFS.
OK so ext4 this time, with new disk images. I notice at mkfs.ext4 that each virtual disk goes from 2MB to 130MB-150MB each. That's a lot of fs metadata, and it's fairly evenly distributed across each drive.
Copied 3.5GB to the volume. Unmount. Poweroff. Killed 3rd of 4. Boot. Mounts fine. No errors. HUH surprising. As soon as I use ls though:
[ 182.461819] EXT4-fs error (device dm-1): __ext4_get_inode_loc:3806: inode #43384833: block 173539360: comm ls: unable to read itable block
# cp -a usr /mnt/btrfs cp: cannot stat ‘usr’: Input/output error
[ 214.411859] EXT4-fs error (device dm-1): __ext4_get_inode_loc:3806: inode #43384833: block 173539360: comm ls: unable to read itable block [ 221.067689] EXT4-fs error (device dm-1): __ext4_get_inode_loc:3806: inode #43384833: block 173539360: comm cp: unable to read itable block
I can't get anything off the drive. And what I have here are ideal conditions because it's a brand new clean file system, no fragmentation, nothing about the LVM volume has been modified, no fsck done. So nothing is corrupt. It's just missing a 1/4 hunk of its PE's. I'd say an older production use fs has zero chance of recovery via mounting.
So this is now a scraping operation with ext4.
Chris Murphy
On Sat, 2015-02-28 at 07:25 +0700, Khemara Lyn wrote:
I have tried with the following:
- Removing the broken PV:
# vgreduce --force vg_hosting /dev/sdc1 Physical volume "/dev/sdc1" still in use
Next time, try "vgreduce --removemissing <VG>" first.
In my experience, any lvm command using --force often has undesirable side effects.
Regarding getting the lvm functioning again, there is also a --partial option that is sometimes useful with the various vg* commands with a missing PV (see man lvm).
And "vgdisplay -v" often regenerates missing metadata (as in getting a functioning lvm back).
Steve
OK It's extremely rude to cross post the same question across multiple lists like this at exactly the same time, and without at least showing the cross posting. I just replied to the one on Fedora users before I saw this post. This sort of thing wastes people's time. Pick one list based on the best case chance for response and give it 24 hours.
Chris Murphy
Ok, sorry about that.
On Sat, February 28, 2015 9:13 am, Chris Murphy wrote:
OK It's extremely rude to cross post the same question across multiple lists like this at exactly the same time, and without at least showing the cross posting. I just replied to the one on Fedora users before I saw this post. This sort of thing wastes people's time. Pick one list based on the best case chance for response and give it 24 hours.
Chris Murphy _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
https://lists.fedoraproject.org/pipermail/users/2015-February/458923.html
I don't see how the VG metadata is restored with any of the commands suggested thus far. I think that's vgcfgrestore. Otherwise I'd think that LVM has no idea how to do the LE to PE mapping.
In any case, this sounds like a data scraping operation to me. XFS might be a bit more tolerant because AG's are distributed across all 4 PV's in this case, and each AG keeps its own metadata. But I still don't think the filesystem will be mountable, even read only. Maybe testdisk can deal with it, and if not then debugfs -c rdump might be able to get some of the directories. But for sure the LV has to be active. And I expect modifications (resizing anything, fscking) astronomically increase the chance of total data loss. If it's XFS xfs_db itself is going to take longer to read and understand than just restoring from backup (XFS has dense capabilities).
On the other hand, Btrfs can handle this situation somewhat well so long as the fs metadata is raid1, which is the mkfs default for multiple devices. It will permit degraded mounting in such a case so recovery is straightforward. Missing files are recorded in dmesg.
Chris Murphy