Hi,
My workstation is running CentOS 7 on two disks (sda and sdb) in a software RAID 1 setup.
It looks like I accidentally nuked it. I wanted to write an installation ISO file to a USB disk, and instead of typing dd if=install.iso of=/dev/sdc I typed /dev/sdb. As soon as I hit <Enter>, the screen froze.
I tried a hard reset, but of course, the boot process would stop short very early in the process.
Now, I have backups of the important stuff of course, so no real catastrophe. But it would be nice if I could get back the data from my disk directly.
I booted a rescue disk (Slax 9.6.4) and I can see my disks as well as raid arrays /dev/md125, /dev/md126 and /dev/md127. Oh, my partitioning scheme is manual and quite simple. Everything is RAID 1, I have a /boot array on /dev/sda1 + /dev/sdb1, swap on /dev/sda2 + /dev/sdb2 and / on /dev/sda3 + /dev/sdb3.
I tried to mount /dev/sda3 directly from the rescue disk:
# mount /dev/sda3 /mnt
But I only get this:
mount: unknown filesystem type 'linux_raid_member'
I'd be very grateful for suggestions.
Cheers,
Niki
Gordon Messmer wrote:
On 12/4/18 2:01 PM, Nicolas Kovacs wrote:
I tried a hard reset, but of course, the boot process would stop short very early in the process.
The system should boot normally if you disconnect sdb. Have you tried that?
Duh! thanks, Gordon, a simpler answer than mine, with the same effect, that /dev/sdb failed as far as mdadm was concerned.
mark
Le 04/12/2018 à 23:10, Gordon Messmer a écrit :
The system should boot normally if you disconnect sdb. Have you tried that?
Unfortunately that didn't work. The boot process stops here:
[OK] Reached target Basic System.
Now what ?
On Tue, 4 Dec 2018 at 17:30, Nicolas Kovacs info@microlinux.fr wrote:
Le 04/12/2018 à 23:10, Gordon Messmer a écrit :
The system should boot normally if you disconnect sdb. Have you tried that?
Unfortunately that didn't work. The boot process stops here:
[OK] Reached target Basic System.
Now what ?
In the rescue mode, recreate the partition table which was on the sdb by copying over what is on sda
sfdisk –d /dev/sda | sfdisk /dev/sdb
This will give the kernel enough to know it has things to do on rebuilding parts.
Le 04/12/2018 à 23:50, Stephen John Smoogen a écrit :
In the rescue mode, recreate the partition table which was on the sdb by copying over what is on sda
sfdisk –d /dev/sda | sfdisk /dev/sdb
This will give the kernel enough to know it has things to do on rebuilding parts.
Once I made sure I retrieved all my data, I followed your suggestion, and it looks like I'm making big progress. The system booted again, though it feels a bit sluggish. Here's the current state of things.
[root@alphamule:~] # cat /proc/mdstat Personalities : [raid1] md125 : active raid1 sdb2[1] sda2[0] 512960 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk
md126 : inactive sda1[0](S) 16777216 blocks super 1.2
md127 : active raid1 sda3[0] 959323136 blocks super 1.2 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk
unused devices: <none>
Now how can I make my RAID array whole again? For the record, /dev/sda is intact, and /dev/sdb is the faulty disk. How can I force synchronization with /dev/sda?
Cheers,
Niki
On 05/12/2018 05:37, Nicolas Kovacs wrote:
Le 04/12/2018 à 23:50, Stephen John Smoogen a écrit :
In the rescue mode, recreate the partition table which was on the sdb by copying over what is on sda
sfdisk –d /dev/sda | sfdisk /dev/sdb
This will give the kernel enough to know it has things to do on rebuilding parts.
Once I made sure I retrieved all my data, I followed your suggestion, and it looks like I'm making big progress. The system booted again, though it feels a bit sluggish. Here's the current state of things.
[root@alphamule:~] # cat /proc/mdstat Personalities : [raid1] md125 : active raid1 sdb2[1] sda2[0] 512960 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk
md126 : inactive sda1[0](S) 16777216 blocks super 1.2
md127 : active raid1 sda3[0] 959323136 blocks super 1.2 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk
unused devices: <none>
Now how can I make my RAID array whole again? For the record, /dev/sda is intact, and /dev/sdb is the faulty disk. How can I force synchronization with /dev/sda?
Cheers,
Niki
If you are confident in the state of sda, I would remove sdb from the array, copy the partition table from sda to sdb as Stephen suggested earlier, then add sdb back to the array and allow the data to be synced:
For example:
mdadm --fail /dev/md125 /dev/sdb2 mdadm --remove /dev/md125 /dev/sdb2
mdadm --fail /dev/md126 /dev/sdb1 mdadm --remove /dev/md126 /dev/sdb1
mdadm --fail /dev/md127 /dev/sdb3 mdadm --remove /dev/md127 /dev/sdb3
sfdisk –d /dev/sda | sfdisk /dev/sdb
then add them back and watch then rebuild:
mdadm --add /dev/md125 /dev/sdb2 mdadm --add /dev/md126 /dev/sdb1 mdadm --add /dev/md127 /dev/sdb3
After they have all resynced, I would flush the device buffers for good measure. For example:
blockdev --flushbufs /dev/sdb1 ...
Lastly, don't forget to reinstall grub to sdb:
grub2-install --recheck /dev/sdb
Le 05/12/2018 à 08:31, Phil Perry a écrit :
If you are confident in the state of sda, I would remove sdb from the array, copy the partition table from sda to sdb as Stephen suggested earlier, then add sdb back to the array and allow the data to be synced:
For example:
mdadm --fail /dev/md125 /dev/sdb2 mdadm --remove /dev/md125 /dev/sdb2
mdadm --fail /dev/md126 /dev/sdb1 mdadm --remove /dev/md126 /dev/sdb1
mdadm --fail /dev/md127 /dev/sdb3 mdadm --remove /dev/md127 /dev/sdb3
sfdisk –d /dev/sda | sfdisk /dev/sdb
then add them back and watch then rebuild:
mdadm --add /dev/md125 /dev/sdb2 mdadm --add /dev/md126 /dev/sdb1 mdadm --add /dev/md127 /dev/sdb3
After they have all resynced, I would flush the device buffers for good measure. For example:
blockdev --flushbufs /dev/sdb1 ...
Lastly, don't forget to reinstall grub to sdb:
grub2-install --recheck /dev/sdb
Thanks very much for the detailed answer. I'll probably give this a spin next week, since right now I have an urgent job to finish, and I'm happy to be able to work on a usable system even though it's a bit sluggish. As soon as the stress is over, I'll try it out.
cheers,
Niki
On Wed, 5 Dec 2018 at 00:36, Nicolas Kovacs info@microlinux.fr wrote:
Le 04/12/2018 à 23:50, Stephen John Smoogen a écrit :
In the rescue mode, recreate the partition table which was on the sdb by copying over what is on sda
sfdisk –d /dev/sda | sfdisk /dev/sdb
This will give the kernel enough to know it has things to do on rebuilding parts.
Once I made sure I retrieved all my data, I followed your suggestion, and it looks like I'm making big progress. The system booted again, though it feels a bit sluggish. Here's the current state of things.
It will because you have 1/2 the bandwidth and there can be a tiny bit of 'write to 2 disks.. nope. read from disk b, nope switch to a'.
[root@alphamule:~] # cat /proc/mdstat Personalities : [raid1] md125 : active raid1 sdb2[1] sda2[0] 512960 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk
md126 : inactive sda1[0](S) 16777216 blocks super 1.2
md127 : active raid1 sda3[0] 959323136 blocks super 1.2 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk
unused devices: <none>
Now how can I make my RAID array whole again? For the record, /dev/sda is intact, and /dev/sdb is the faulty disk. How can I force synchronization with /dev/sda?
Cheers,
Phil Perry posted all the things in a better email than I could have (pperry++)
Niki
-- Microlinux - Solutions informatiques durables 7, place de l'église - 30730 Montpezat Site : https://www.microlinux.fr Blog : https://blog.microlinux.fr Mail : info@microlinux.fr Tél. : 04 66 63 10 32 _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On 12/4/18 2:31 PM, Nicolas Kovacs wrote:
Unfortunately that didn't work. The boot process stops here: [OK] Reached target Basic System. Now what ?
Remove "rhgb quiet" from the kernel boot args and see if you get any more information about what's happening. "Reached target Basic System." is recorded twice in the boot logs on a system I checked a moment ago, so I'm not really sure where yours is stalling.
Nicolas Kovacs wrote:
My workstation is running CentOS 7 on two disks (sda and sdb) in a software RAID 1 setup.
It looks like I accidentally nuked it. I wanted to write an installation ISO file to a USB disk, and instead of typing dd if=install.iso of=/dev/sdc I typed /dev/sdb. As soon as I hit <Enter>, the screen froze.
I tried a hard reset, but of course, the boot process would stop short very early in the process.
Now, I have backups of the important stuff of course, so no real catastrophe. But it would be nice if I could get back the data from my disk directly.
I booted a rescue disk (Slax 9.6.4) and I can see my disks as well as raid arrays /dev/md125, /dev/md126 and /dev/md127. Oh, my partitioning scheme is manual and quite simple. Everything is RAID 1, I have a /boot array on /dev/sda1 + /dev/sdb1, swap on /dev/sda2 + /dev/sdb2 and / on /dev/sda3 + /dev/sdb3.
I tried to mount /dev/sda3 directly from the rescue disk:
# mount /dev/sda3 /mnt
But I only get this:
mount: unknown filesystem type 'linux_raid_member'
I'd be very grateful for suggestions.
Condolences.
I think how I'd go about it would be to boot off a rescue disk, then either try to mount the raid, or just edit the /etc/mdadm.conf, and tell it only sda, and maybe sdb marked as failed. Then see if you can mount the raid.
mark
Le 04/12/2018 à 23:12, mark a écrit :
I think how I'd go about it would be to boot off a rescue disk, then either try to mount the raid, or just edit the /etc/mdadm.conf, and tell it only sda, and maybe sdb marked as failed. Then see if you can mount the raid.
OK, I got a partial success that's not so bad. The bad news is that the system won't boot even if I unplug sdb. The good news is I'm currently retrieving my data.
Once I booted a Slax Live CD with only sda connected, I couldn't mount it since it's a RAID member. So here's what I did.
# mdadm -Ss # mdadm -A -R /dev/md9 /dev/sda3 # mount /dev/md9 /mnt
A peek in /mnt, seems like everything's still there. So I'm currently transferring 300 GB of data to my server.
A word on backups. I have all the vital stuff on my server, with daily snapshots using Rsnapshot. But all the audio and video stuff is excluded, not to mention all my settings in Firefox, Thunderbird, etc.
Anyway: thanks very much for your help, guys.
Cheers,
Niki