Accidentally nuked my system - any suggestions ? - Discuss

List overview All Threads
Download

newer

Accidentally nuked my system - any suggestions ?

older

CR repo update disaster for my...

7.5 -> 7.6 problems

Nicolas Kovacs

4 Dec 2018 4 Dec '18

10:01 p.m.

Hi,

My workstation is running CentOS 7 on two disks (sda and sdb) in a software RAID 1 setup.

It looks like I accidentally nuked it. I wanted to write an installation ISO file to a USB disk, and instead of typing dd if=install.iso of=/dev/sdc I typed /dev/sdb. As soon as I hit <Enter>, the screen froze.

I tried a hard reset, but of course, the boot process would stop short very early in the process.

Now, I have backups of the important stuff of course, so no real catastrophe. But it would be nice if I could get back the data from my disk directly.

I booted a rescue disk (Slax 9.6.4) and I can see my disks as well as raid arrays /dev/md125, /dev/md126 and /dev/md127. Oh, my partitioning scheme is manual and quite simple. Everything is RAID 1, I have a /boot array on /dev/sda1 + /dev/sdb1, swap on /dev/sda2 + /dev/sdb2 and / on /dev/sda3 + /dev/sdb3.

I tried to mount /dev/sda3 directly from the rescue disk:

# mount /dev/sda3 /mnt

But I only get this:

mount: unknown filesystem type 'linux_raid_member'

I'd be very grateful for suggestions.

Cheers,

Niki

-- Microlinux - Solutions informatiques durables 7, place de l'église - 30730 Montpezat Site : https://www.microlinux.fr Blog : https://blog.microlinux.fr Mail : info@microlinux.fr Tél. : 04 66 63 10 32

Show replies by date

Gordon Messmer

4 Dec 4 Dec

10:10 p.m.

On 12/4/18 2:01 PM, Nicolas Kovacs wrote:

...

I tried a hard reset, but of course, the boot process would stop short very early in the process.

The system should boot normally if you disconnect sdb. Have you tried that?

mark

10:13 p.m.

Gordon Messmer wrote:

...

On 12/4/18 2:01 PM, Nicolas Kovacs wrote:

...
I tried a hard reset, but of course, the boot process would stop short very early in the process.

The system should boot normally if you disconnect sdb. Have you tried that?

Duh! thanks, Gordon, a simpler answer than mine, with the same effect, that /dev/sdb failed as far as mdadm was concerned.

mark

Nicolas Kovacs

10:31 p.m.

Le 04/12/2018 à 23:10, Gordon Messmer a écrit :

...

The system should boot normally if you disconnect sdb. Have you tried that?

Unfortunately that didn't work. The boot process stops here:

[OK] Reached target Basic System.

Now what ?

Stephen John Smoogen

10:50 p.m.

On Tue, 4 Dec 2018 at 17:30, Nicolas Kovacs info@microlinux.fr wrote:

...

Le 04/12/2018 à 23:10, Gordon Messmer a écrit :

...
The system should boot normally if you disconnect sdb. Have you tried that?

Unfortunately that didn't work. The boot process stops here:

[OK] Reached target Basic System.

Now what ?

In the rescue mode, recreate the partition table which was on the sdb by copying over what is on sda

sfdisk –d /dev/sda | sfdisk /dev/sdb

This will give the kernel enough to know it has things to do on rebuilding parts.

-- Stephen J Smoogen.

Nicolas Kovacs

5 Dec 5 Dec

5:37 a.m.

Le 04/12/2018 à 23:50, Stephen John Smoogen a écrit :

...

In the rescue mode, recreate the partition table which was on the sdb by copying over what is on sda

sfdisk –d /dev/sda | sfdisk /dev/sdb

This will give the kernel enough to know it has things to do on rebuilding parts.

Once I made sure I retrieved all my data, I followed your suggestion, and it looks like I'm making big progress. The system booted again, though it feels a bit sluggish. Here's the current state of things.

[root@alphamule:~] # cat /proc/mdstat Personalities : [raid1] md125 : active raid1 sdb2[1] sda2[0] 512960 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : inactive sda1[0](S) 16777216 blocks super 1.2

md127 : active raid1 sda3[0] 959323136 blocks super 1.2 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk

unused devices: <none>

Now how can I make my RAID array whole again? For the record, /dev/sda is intact, and /dev/sdb is the faulty disk. How can I force synchronization with /dev/sda?

Cheers,

Niki

Phil Perry

7:31 a.m.

On 05/12/2018 05:37, Nicolas Kovacs wrote:

...

Le 04/12/2018 à 23:50, Stephen John Smoogen a écrit :

...
In the rescue mode, recreate the partition table which was on the sdb by copying over what is on sda

sfdisk –d /dev/sda | sfdisk /dev/sdb

This will give the kernel enough to know it has things to do on rebuilding parts.

Once I made sure I retrieved all my data, I followed your suggestion, and it looks like I'm making big progress. The system booted again, though it feels a bit sluggish. Here's the current state of things.

[root@alphamule:~] # cat /proc/mdstat Personalities : [raid1] md125 : active raid1 sdb2[1] sda2[0] 512960 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : inactive sda1[0](S) 16777216 blocks super 1.2

md127 : active raid1 sda3[0] 959323136 blocks super 1.2 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk

unused devices: <none>

Now how can I make my RAID array whole again? For the record, /dev/sda is intact, and /dev/sdb is the faulty disk. How can I force synchronization with /dev/sda?

Cheers,

Niki

If you are confident in the state of sda, I would remove sdb from the array, copy the partition table from sda to sdb as Stephen suggested earlier, then add sdb back to the array and allow the data to be synced:

For example:

mdadm --fail /dev/md125 /dev/sdb2 mdadm --remove /dev/md125 /dev/sdb2

mdadm --fail /dev/md126 /dev/sdb1 mdadm --remove /dev/md126 /dev/sdb1

mdadm --fail /dev/md127 /dev/sdb3 mdadm --remove /dev/md127 /dev/sdb3

sfdisk –d /dev/sda | sfdisk /dev/sdb

then add them back and watch then rebuild:

mdadm --add /dev/md125 /dev/sdb2 mdadm --add /dev/md126 /dev/sdb1 mdadm --add /dev/md127 /dev/sdb3

After they have all resynced, I would flush the device buffers for good measure. For example:

blockdev --flushbufs /dev/sdb1 ...

Lastly, don't forget to reinstall grub to sdb:

grub2-install --recheck /dev/sdb

Nicolas Kovacs

6:49 p.m.

Le 05/12/2018 à 08:31, Phil Perry a écrit :

...

If you are confident in the state of sda, I would remove sdb from the array, copy the partition table from sda to sdb as Stephen suggested earlier, then add sdb back to the array and allow the data to be synced:

For example:

mdadm --fail /dev/md125 /dev/sdb2 mdadm --remove /dev/md125 /dev/sdb2

mdadm --fail /dev/md126 /dev/sdb1 mdadm --remove /dev/md126 /dev/sdb1

mdadm --fail /dev/md127 /dev/sdb3 mdadm --remove /dev/md127 /dev/sdb3

sfdisk –d /dev/sda | sfdisk /dev/sdb

then add them back and watch then rebuild:

mdadm --add /dev/md125 /dev/sdb2 mdadm --add /dev/md126 /dev/sdb1 mdadm --add /dev/md127 /dev/sdb3

After they have all resynced, I would flush the device buffers for good measure. For example:

blockdev --flushbufs /dev/sdb1 ...

Lastly, don't forget to reinstall grub to sdb:

grub2-install --recheck /dev/sdb

Thanks very much for the detailed answer. I'll probably give this a spin next week, since right now I have an urgent job to finish, and I'm happy to be able to work on a usable system even though it's a bit sluggish. As soon as the stress is over, I'll try it out.

cheers,

Niki

Stephen John Smoogen

1:03 p.m.

On Wed, 5 Dec 2018 at 00:36, Nicolas Kovacs info@microlinux.fr wrote:

...

Le 04/12/2018 à 23:50, Stephen John Smoogen a écrit :

...
In the rescue mode, recreate the partition table which was on the sdb by copying over what is on sda

sfdisk –d /dev/sda | sfdisk /dev/sdb

This will give the kernel enough to know it has things to do on rebuilding parts.

Once I made sure I retrieved all my data, I followed your suggestion, and it looks like I'm making big progress. The system booted again, though it feels a bit sluggish. Here's the current state of things.

It will because you have 1/2 the bandwidth and there can be a tiny bit of 'write to 2 disks.. nope. read from disk b, nope switch to a'.

...

[root@alphamule:~] # cat /proc/mdstat Personalities : [raid1] md125 : active raid1 sdb2[1] sda2[0] 512960 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : inactive sda1[0](S) 16777216 blocks super 1.2

md127 : active raid1 sda3[0] 959323136 blocks super 1.2 [2/1] [U_] bitmap: 8/8 pages [32KB], 65536KB chunk

unused devices: <none>

Now how can I make my RAID array whole again? For the record, /dev/sda is intact, and /dev/sdb is the faulty disk. How can I force synchronization with /dev/sda?

Cheers,

Phil Perry posted all the things in a better email than I could have (pperry++)

...

Niki

-- Microlinux - Solutions informatiques durables 7, place de l'église - 30730 Montpezat Site : https://www.microlinux.fr Blog : https://blog.microlinux.fr Mail : info@microlinux.fr Tél. : 04 66 63 10 32 _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

-- Stephen J Smoogen.

Gordon Messmer

4 Dec 4 Dec

10:55 p.m.

On 12/4/18 2:31 PM, Nicolas Kovacs wrote:

...

Unfortunately that didn't work. The boot process stops here: [OK] Reached target Basic System. Now what ?

Remove "rhgb quiet" from the kernel boot args and see if you get any more information about what's happening. "Reached target Basic System." is recorded twice in the boot logs on a system I checked a moment ago, so I'm not really sure where yours is stalling.

mark

10:12 p.m.

Nicolas Kovacs wrote:

...

My workstation is running CentOS 7 on two disks (sda and sdb) in a software RAID 1 setup.

It looks like I accidentally nuked it. I wanted to write an installation ISO file to a USB disk, and instead of typing dd if=install.iso of=/dev/sdc I typed /dev/sdb. As soon as I hit <Enter>, the screen froze.

I tried a hard reset, but of course, the boot process would stop short very early in the process.

Now, I have backups of the important stuff of course, so no real catastrophe. But it would be nice if I could get back the data from my disk directly.

I booted a rescue disk (Slax 9.6.4) and I can see my disks as well as raid arrays /dev/md125, /dev/md126 and /dev/md127. Oh, my partitioning scheme is manual and quite simple. Everything is RAID 1, I have a /boot array on /dev/sda1 + /dev/sdb1, swap on /dev/sda2 + /dev/sdb2 and / on /dev/sda3 + /dev/sdb3.

I tried to mount /dev/sda3 directly from the rescue disk:

# mount /dev/sda3 /mnt

But I only get this:

mount: unknown filesystem type 'linux_raid_member'

I'd be very grateful for suggestions.

Condolences.

I think how I'd go about it would be to boot off a rescue disk, then either try to mount the raid, or just edit the /etc/mdadm.conf, and tell it only sda, and maybe sdb marked as failed. Then see if you can mount the raid.

mark

Nicolas Kovacs

10:50 p.m.

Le 04/12/2018 à 23:12, mark a écrit :

...

I think how I'd go about it would be to boot off a rescue disk, then either try to mount the raid, or just edit the /etc/mdadm.conf, and tell it only sda, and maybe sdb marked as failed. Then see if you can mount the raid.

OK, I got a partial success that's not so bad. The bad news is that the system won't boot even if I unplug sdb. The good news is I'm currently retrieving my data.

Once I booted a Slax Live CD with only sda connected, I couldn't mount it since it's a RAID member. So here's what I did.

# mdadm -Ss # mdadm -A -R /dev/md9 /dev/sda3 # mount /dev/md9 /mnt

A peek in /mnt, seems like everything's still there. So I'm currently transferring 300 GB of data to my server.

A word on backups. I have all the vital stuff on my server, with daily snapshots using Rsnapshot. But all the audio and video stuff is excluded, not to mention all my settings in Firefox, Thunderbird, etc.

Anyway: thanks very much for your help, guys.

Cheers,

Niki

2414

Age (days ago)

2415

Last active (days ago)

discuss@lists.centos.org

11 comments

5 participants

tags (0)

participants (5)

Gordon Messmer
mark
Nicolas Kovacs
Phil Perry
Stephen John Smoogen