Kernel panic after removing SW RAID1 partitions, setting up ZFS.

List overview All Threads
Download

newer

older

(no subject)

CentOS Dojo at DevConf.US: CFP and...

Benjamin Smith

9 Apr 2019 9 Apr '19

2:07 a.m.

System is CentOS 6 all up to date, previously had two drives in MD RAID configuration.

md0: sda1/sdb1, 20 GB, OS / Partition md1: sda2/sdb2, 1 TB, data mounted as /home

Installed kmod ZFS via yum, reboot, zpool works fine. Backed up the /home data 2x, then stopped the sd[ab]2 partition with:

mdadm --stop /dev/md1; mdadm --zero-superblock /dev/sd[ab]1;

Removed /home in /etc/fstab. Used fdisk to set the partition type to gpt for sda2 and sdb2, then built *then destroyed* a ZFS mirror pool using the two partitions.

Now the system won't boot, has a kernel panic. I'm remote, so I'll be going in tomorrow to see what's up. My assumption is that it has something to do with mdadm/RAID not being "fully removed".

Any idea what I might have missed?

Here's a screenshot of the panic http://effortlessis.com/images/IMG_0705.JPG

Show replies by date

tony＠softins.co.uk

9 Apr 9 Apr

9:10 a.m.

In article 6566355.ijNRhnPfCt@tesla.schoolpathways.com, Benjamin Smith lists@benjamindsmith.com wrote:

...

System is CentOS 6 all up to date, previously had two drives in MD RAID configuration.

md0: sda1/sdb1, 20 GB, OS / Partition md1: sda2/sdb2, 1 TB, data mounted as /home

Installed kmod ZFS via yum, reboot, zpool works fine. Backed up the /home data 2x, then stopped the sd[ab]2 partition with:

mdadm --stop /dev/md1; mdadm --zero-superblock /dev/sd[ab]1;

Did you mean /dev/sd[ab]2 instead?

...

Removed /home in /etc/fstab. Used fdisk to set the partition type to gpt for sda2 and sdb2, then built *then destroyed* a ZFS mirror pool using the two partitions.

Now the system won't boot, has a kernel panic. I'm remote, so I'll be going in tomorrow to see what's up. My assumption is that it has something to do with mdadm/RAID not being "fully removed".

Any idea what I might have missed?

I think it's because you clobbered md0 when you did --zero-superblock on sd[ab]1 instead of 2.

Don't you love it when some things count from 0 and others from 1?

Cheers Tony

-- Tony Mountifield Work: tony@softins.co.uk - http://www.softins.co.uk Play: tony@mountifield.org - http://tony.mountifield.org

Simon Matter

9:53 a.m.

New subject: Kernel panic after removing SW RAID1 partitions, setting up ZFS.

...

In article 6566355.ijNRhnPfCt@tesla.schoolpathways.com, Benjamin Smith lists@benjamindsmith.com wrote:

...
System is CentOS 6 all up to date, previously had two drives in MD RAID configuration.

md0: sda1/sdb1, 20 GB, OS / Partition md1: sda2/sdb2, 1 TB, data mounted as /home

Installed kmod ZFS via yum, reboot, zpool works fine. Backed up the /home data 2x, then stopped the sd[ab]2 partition with:

mdadm --stop /dev/md1; mdadm --zero-superblock /dev/sd[ab]1;

Did you mean /dev/sd[ab]2 instead?

...
Removed /home in /etc/fstab. Used fdisk to set the partition type to gpt for sda2 and sdb2, then built *then destroyed* a ZFS mirror pool using the two partitions.

Now the system won't boot, has a kernel panic. I'm remote, so I'll be going in tomorrow to see what's up. My assumption is that it has something to do with mdadm/RAID not being "fully removed".

Any idea what I might have missed?

I think it's because you clobbered md0 when you did --zero-superblock on sd[ab]1 instead of 2.

Don't you love it when some things count from 0 and others from 1?

That's really a problem but difficult to fix I guess. IMHO it's better to keep things the way they are as long as the solution is not really better than the old behavior. Maybe the new Linux Ethernet naming scheme can serve as a bad example if you ask me.

But here, mdadm could have done better: --zero-superblock checks if the device contains a valid md superblock, but it fails to also check if the device belongs to a running md device :-(

If it turns out that this is your problem, maybe you could ask the mdadm developers to improve it?

Regards, Simon

Benjamin Smith

10 Apr 10 Apr

3:38 p.m.

On Tuesday, April 9, 2019 2:53:55 AM PDT Simon Matter via CentOS wrote:

...

...
I think it's because you clobbered md0 when you did --zero-superblock on sd[ab]1 instead of 2.

As mentioned in another reply, this was a typo in the email, not on the machine.

I drove to the site, picked up the machine, and last night found that the problem wasn't anything to do with mdadm, but rather setting a partition to GPT. For some reason, you *cannot* have a partition of type GPT and expect Linux to boot. (WT F/H?!?)

So I changed the type of the partitions used for the ZFS pool to Solaris (just a random guess) and it's all working beautifully now. Don't know if that's recommended procedure as I've always used whole disks for ZFS pools, and a little Google pounding gave no useful leads. Many examples of using partitions or files as HW devices in a ZFS pool, but no info on what partition *type* should be used.

Next up: escalating this with the CentOS list...

Warren Young

4:33 p.m.

New subject: Kernel panic after removing SW RAID1 partitions, setting up ZFS.

On Apr 10, 2019, at 9:38 AM, Benjamin Smith lists@benjamindsmith.com wrote:

...

For some reason, you *cannot* have a partition of type GPT and expect Linux to boot. (WT F/H?!?)

I believe you were trying to make use of a facility invented as part of the GPT Protective Partition feature without understanding it first:

https://en.wikipedia.org/wiki/GUID_Partition_Table#Protective_MBR_(LBA_0)

As a normal user, there is no good cause to be changing an MBR partition’s type to GPT in this way. It’s a feature that only GPT partitioning tools should be making use of, and then only to prevent legacy OSes from interfering with actual GPT partitioning schemes.

In other words, you’ve mislead the boot loader into trying to seek out an *actual* GPT partition table, which doesn’t exist, giving the symptom you saw.

I’ve never used ZFS with MBR partitions. Normally I feed it whole disks, in which case the ZoL zpool implementation will create GPT partition tables and give the first partition code BF01. That means type BF *might* be the correct value on MBR.

I suspect you could just as well use type 83 (Linux generic) for this, since that doesn’t refer to any specific file system. Properly-written utilities do metadata probing to figure out what tools to use with it, so putting ZFS on a type 83 MBR partition should be harmless, since only ZFS tools will admit to being able to do anything with it.

2330

Age (days ago)

2331

Last active (days ago)

discuss@lists.centos.org

4 comments

4 participants

tags (0)

participants (4)

Benjamin Smith
Simon Matter
tony＠softins.co.uk
Warren Young