CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present - Discuss

List overview All Threads
Download

newer

CentOS 6.2 on partitionable mdadm RAID1 (md_d0) - kernel panic with either disk not present

older

Killing a Service

scp scripting question

Arun Khan

19 Jun 2012 19 Jun '12

5:37 p.m.

Environment: CentOS 6.2 amd64 (min. server install) 2 virtual hard disks of 10GB each Linux KVM

Following the instructions on CentOS Wiki http://wiki.centos.org/HowTos/Install_On_Partitionable_RAID1 I installed a min. server in Linux KVM setup (script shown below)

kvm \ -vga std \ -m 1024 \ -cpu core2duo \ -smp 2,cores=2 \ -drive file=/home/arunk/KVM/vdisks/centos62.raid1.disk1.img \ -drive file=/home/arunk/KVM/vdisks/centos62.raid1.disk2.img \ -net nic,vlan=1,model=e1000,macaddr=${nic_mac_addr0} \ -net tap,vlan=1,ifname=tap0,script=no,downscript=no \

</script>

The system boots fine when both disks are available. When I remove either of the disks (delete the -drive file= line), the system boots to a point wherein the GRUB menu is displayed and the progress bar displays for a while till the white bar reaches about halfway point and then it:

Kernel panic - not syncing: Attempted to kill init!

The purpose of the above exercise was to test if the system will work in spite of losing one of the disks which is what RAID1 is supposed to do (I am emulating by removing the disk image line from the definition of the system).

Has anyone tested CentOS 6.2 on partitionable mdadm RAID1 setup in bare metal hardware?

Any suggestions/ideas as to what I may doing incorrectly?

FWIW, outputs from "fdisk -l" and "df -hT"

<fdisk -l> root@centos62-raid1 ~ > # fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000e8353

Device Boot Start End Blocks Id System /dev/sda1 * 1 523 4194304 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 523 1045 4194304 83 Linux /dev/sda3 1045 1176 1048576 82 Linux swap / Solaris

Disk /dev/sdb: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000e8353

Device Boot Start End Blocks Id System /dev/sdb1 * 1 523 4194304 83 Linux Partition 1 does not end on cylinder boundary. /dev/sdb2 523 1045 4194304 83 Linux /dev/sdb3 1045 1176 1048576 82 Linux swap / Solaris

Disk /dev/md_d0: 10.7 GB, 10737352704 bytes 2 heads, 4 sectors/track, 2621424 cylinders Units = cylinders of 8 * 512 = 4096 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000e8353

Device Boot Start End Blocks Id System /dev/md_d0p1 * 257 1048832 4194304 83 Linux Partition 1 does not end on cylinder boundary. /dev/md_d0p2 1048833 2097408 4194304 83 Linux Partition 2 does not end on cylinder boundary. /dev/md_d0p3 2097409 2359552 1048576 82 Linux swap / Solaris Partition 3 does not end on cylinder boundary

</fdisk -l>

# df -hT Filesystem Type Size Used Avail Use% Mounted on /dev/md_d0p1 ext4 4.0G 1.9G 1.9G 50% / tmpfs tmpfs 499M 0 499M 0% /dev/shm /dev/md_d0p2 ext4 4.0G 136M 3.7G 4% /home

</df -hT>

-- Arun Khan

Show replies by date

m.roth＠5-cent.us

19 Jun 19 Jun

6:41 p.m.

Arun Khan wrote: <snip>

...

Following the instructions on CentOS Wiki http://wiki.centos.org/HowTos/Install_On_Partitionable_RAID1 I installed a min. server in Linux KVM setup (script shown below)

<snip>

...

The system boots fine when both disks are available. When I remove either of the disks (delete the -drive file= line), the system boots to a point wherein the GRUB menu is displayed and the progress bar displays for a while till the white bar reaches about halfway point and then it:

Kernel panic - not syncing: Attempted to kill init!

<snip>

...

<fdisk -l> root@centos62-raid1 ~ > # fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000e8353

Device Boot Start End Blocks Id System /dev/sda1 * 1 523 4194304 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 523 1045 4194304 83 Linux /dev/sda3 1045 1176 1048576 82 Linux swap / Solaris

<snip> Ok, I see that it's hardware 512b blocks, so you're not running into issues with 4k hardware blocks. I trust you installed grub on /dev/md0, which I assume is /dev/sda1 and /dev/sdb1?

mark

Arun Khan

7:23 p.m.

On Wed, Jun 20, 2012 at 12:11 AM, m.roth@5-cent.us wrote:

...

Arun Khan wrote:

...
Device Boot Start End Blocks Id System /dev/sda1 * 1 523 4194304 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 523 1045 4194304 83 Linux /dev/sda3 1045 1176 1048576 82 Linux swap / Solaris

<snip> Ok, I see that it's hardware 512b blocks, so you're not running into issues with 4k hardware blocks. I trust you installed grub on /dev/md0, which I assume is /dev/sda1 and /dev/sdb1?

...

From the wiki instructions, there is no re-installation of GRUB, only

a couple of changes in /boot/grub/grub.conf file installed by the regular installation on /dev/sda. During the RAID1 creation process the grub from /dev/sda would mirrored into the RAID1 device and appear on the MBR of both the disks.

As I said in the OP, I do see the grub menu with either of the disks "unplugged" i.e. missing. The kernel does boot and the white progress bar goes upto about 50% when the kernel panic occurs. I will turn off the splash and see what comes up on the console. Gut feeling -- I suspect the problem is with the initrd image created with the "dracut"

-- Arun Khan

m.roth＠5-cent.us

7:30 p.m.

Arun Khan wrote:

...

On Wed, Jun 20, 2012 at 12:11 AM, m.roth@5-cent.us wrote:

...
Arun Khan wrote:

...
Device Boot Start End Blocks Id System /dev/sda1 * 1 523 4194304 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 523 1045 4194304 83 Linux /dev/sda3 1045 1176 1048576 82 Linux swap / Solaris

<snip> Ok, I see that it's hardware 512b blocks, so you're not running into issues with 4k hardware blocks. I trust you installed grub on /dev/md0, which I assume is /dev/sda1 and /dev/sdb1?

...
From the wiki instructions, there is no re-installation of GRUB, only

a couple of changes in /boot/grub/grub.conf file installed by the regular installation on /dev/sda. During the RAID1 creation process the grub from /dev/sda would mirrored into the RAID1 device and appear on the MBR of both the disks.

As I said in the OP, I do see the grub menu with either of the disks "unplugged" i.e. missing. The kernel does boot and the white progress bar goes upto about 50% when the kernel panic occurs. I will turn off the splash and see what comes up on the console. Gut feeling -- I suspect the problem is with the initrd image created with the "dracut"

For one thing, edit grub.conf and get *rid* of that idiot rhgb and quiet, so you can actually see what's happening. Sounds to me as though it's trying to switch root to a real drive from the virtual drive of the ramfs, and it's not working. One thing you *might* also try is before you boot, edit the kernel line in grub, and add rdshell at the end, so you boot into grub's rudimentary shell if/when it fails, and you can look around and find what it's seeing.

mark

Arun Khan

20 Jun 20 Jun

4:36 a.m.

On Wed, Jun 20, 2012 at 1:00 AM, m.roth@5-cent.us wrote:

.... snip ....

...

For one thing, edit grub.conf and get *rid* of that idiot rhgb and quiet, so you can actually see what's happening. Sounds to me as though it's trying to switch root to a real drive from the virtual drive of the ramfs, and it's not working. One thing you *might* also try is before you boot, edit the kernel line in grub, and add rdshell at the end, so you boot into grub's rudimentary shell if/when it fails, and you can look around and find what it's seeing.

Will try your suggestion and report back.

-- Arun Khan

Arun Khan

4:11 p.m.

On Wed, Jun 20, 2012 at 10:06 AM, Arun Khan knura9@gmail.com wrote:

...

On Wed, Jun 20, 2012 at 1:00 AM, m.roth@5-cent.us wrote:

.... snip ....

...
For one thing, edit grub.conf and get *rid* of that idiot rhgb and quiet, so you can actually see what's happening. Sounds to me as though it's trying to switch root to a real drive from the virtual drive of the ramfs, and it's not working. One thing you *might* also try is before you boot, edit the kernel line in grub, and add rdshell at the end, so you boot into grub's rudimentary shell if/when it fails, and you can look around and find what it's seeing.

Will try your suggestion and report back.

As mentioned already there are no issues with both disks connected. In this scenario, I have changed the "Partition ID" of the partitionable RAID1 partitions /dev/md_d0p1 and /dev/md_d0p2 to 'fd' and then rebooted the system (recall earlier these partitions had Id=83).

I also made the suggested changes to /boot/grub/grub.conf by Mark

Rebooted the system with both disks connected - system boots fine. Messages are displayed including the md driver binding /dev/sda and /dev/sdb. The "root" device /dev/md_d0p1 is detected and it is mounted on / and life is hunky dory.

Reboot the system with disk1 removed, the kernel boots, the 'md' driver tries to bind sda. At this point the systems seems to hang for a few seconds and then 'dracut' reports that it cannot find /dev/md_dop1 (the root partition)

dracut Warning: No root device "block:/dev/md_d0p1" found

Console image pasted here http://imagebin.org/217229

In the "rdshell" environment I can see that /etc/mdadm.conf is defined but beyond this I don't know what to look for.

Changing the Partition Id for the RAID1 partitions to 'fd' does not help.

Any further suggestions and/or comments?

-- Arun Khan

m.roth＠5-cent.us

5:27 p.m.

Arun Khan wrote:

...

On Wed, Jun 20, 2012 at 10:06 AM, Arun Khan knura9@gmail.com wrote:

...
On Wed, Jun 20, 2012 at 1:00 AM, m.roth@5-cent.us wrote:

.... snip ....

...
For one thing, edit grub.conf and get *rid* of that idiot rhgb and quiet,

<snip>

...

...
...
edit the kernel line in grub, and add rdshell at the end, so you boot into grub's rudimentary shell if/when it fails, and you can look

around and

...

...
...
find what it's seeing.

Will try your suggestion and report back.

<nsip>

...

Reboot the system with disk1 removed, the kernel boots, the 'md' driver tries to bind sda. At this point the systems seems to hang for a few seconds and then 'dracut' reports that it cannot find /dev/md_dop1 (the root partition)
     dracut Warning: No root device "block:/dev/md_d0p1" found
Console image pasted here http://imagebin.org/217229

At this point, I'm starting to wonder if the initrd.img has the drivers for software RAID. You *might* need to rebuild that.

...

In the "rdshell" environment I can see that /etc/mdadm.conf is defined but beyond this I don't know what to look for.

Changing the Partition Id for the RAID1 partitions to 'fd' does not help.

Any further suggestions and/or comments?

What devices are there in /dev/? /dev/sd? /dev/md?

mark

Arun Khan

8:14 p.m.

On Wed, Jun 20, 2012 at 10:57 PM, m.roth@5-cent.us wrote:

...

Arun Khan wrote:

...

...
Reboot the system with disk1 removed, the kernel boots, the 'md' driver tries to bind sda. At this point the systems seems to hang for a few seconds and then 'dracut' reports that it cannot find /dev/md_dop1 (the root partition)

dracut Warning: No root device "block:/dev/md_d0p1" found

Console image pasted here http://imagebin.org/217229

At this point, I'm starting to wonder if the initrd.img has the drivers for software RAID. You *might* need to rebuild that.

Using 'dracut' I did create a new initramfs file per the instruction in the wiki.

Nonetheless, assuming that the md module is missing in the new initramfs, one would expect the boot to fail with /dev/sda and /dev/sdb both connected to the system. The fact the system boots in this case shows that the md driver is present.

See screenshot here http://imagebin.org/217246

...

...
In the "rdshell" environment I can see that /etc/mdadm.conf is defined but beyond this I don't know what to look for.

Changing the Partition Id for the RAID1 partitions to 'fd' does not help.

Any further suggestions and/or comments?

What devices are there in /dev/? /dev/sd? /dev/md?

/dev/md_d0 /dev/md/md-device-map

Please see screenshot http://imagebin.org/217263

-- Arun Khan

Rob Kampen

21 Jun 21 Jun

4:39 a.m.

On 06/21/2012 04:11 AM, Arun Khan wrote:

...

On Wed, Jun 20, 2012 at 10:06 AM, Arun Khanknura9@gmail.com wrote:

...
On Wed, Jun 20, 2012 at 1:00 AM,m.roth@5-cent.us wrote:

.... snip ....

...
For one thing, edit grub.conf and get *rid* of that idiot rhgb and quiet, so you can actually see what's happening. Sounds to me as though it's trying to switch root to a real drive from the virtual drive of the ramfs, and it's not working. One thing you *might* also try is before you boot, edit the kernel line in grub, and add rdshell at the end, so you boot into grub's rudimentary shell if/when it fails, and you can look around and find what it's seeing.

Will try your suggestion and report back.

As mentioned already there are no issues with both disks connected. In this scenario, I have changed the "Partition ID" of the partitionable RAID1 partitions /dev/md_d0p1 and /dev/md_d0p2 to 'fd' and then rebooted the system (recall earlier these partitions had Id=83).

I also made the suggested changes to /boot/grub/grub.conf by Mark

Rebooted the system with both disks connected - system boots fine. Messages are displayed including the md driver binding /dev/sda and /dev/sdb. The "root" device /dev/md_d0p1 is detected and it is mounted on / and life is hunky dory.

Reboot the system with disk1 removed, the kernel boots, the 'md' driver tries to bind sda. At this point the systems seems to hang for a few seconds and then 'dracut' reports that it cannot find /dev/md_dop1 (the root partition)
      dracut Warning: No root device "block:/dev/md_d0p1" found

sounds like the mirror is not in synch - when it is running with both drives, what does

...

cat /proc/mdstat

show??

...

Console image pasted herehttp://imagebin.org/217229

In the "rdshell" environment I can see that /etc/mdadm.conf is defined but beyond this I don't know what to look for.

Changing the Partition Id for the RAID1 partitions to 'fd' does not help.

Any further suggestions and/or comments?

-- Arun Khan _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Arun Khan

6:34 a.m.

On Thu, Jun 21, 2012 at 10:09 AM, Rob Kampen rkampen@reaching-clients.com wrote: .... snip ....

...

sounds like the mirror is not in synch - when it is running with both drives, what does

...
cat /proc/mdstat

System boots up fully functional with both disks <copy-paste>

root@centos62-raid1 ~ > # cat /proc/mdstat Personalities : [raid1] md_d0 : active raid1 sda[0] sdb[1] 10485696 blocks [2/2] [UU]

unused devices: <none>

</copy-paste>

Both disks are in sync.

Anyways, even if they were out of sync the system should boot with the "disk" that is in "U" state but it does not.

System boots up in rdshell (failed mode) with one of the disks disconnected.

# cat /proc/mdstat

Personalities: md_d0: inactive sda[0] (S) 10485696 blocks

</cat /proc/mdstat>

I do not know the internal workings of "dracut" but the problem seems to be within it (gut feeling).

-- Arun Khan

Scott Silva

6:10 p.m.

on 6/20/2012 11:34 PM Arun Khan spake the following:

...

On Thu, Jun 21, 2012 at 10:09 AM, Rob Kampen rkampen@reaching-clients.com wrote: .... snip ....

...
sounds like the mirror is not in synch - when it is running with both drives, what does

...
cat /proc/mdstat

System boots up fully functional with both disks

<copy-paste>

root@centos62-raid1 ~ > # cat /proc/mdstat Personalities : [raid1] md_d0 : active raid1 sda[0] sdb[1] 10485696 blocks [2/2] [UU]

unused devices: <none>

</copy-paste>

Both disks are in sync.

Anyways, even if they were out of sync the system should boot with the "disk" that is in "U" state but it does not.

System boots up in rdshell (failed mode) with one of the disks disconnected.

<cat /proc/mdstat>

# cat /proc/mdstat

Personalities: md_d0: inactive sda[0] (S) 10485696 blocks

</cat /proc/mdstat>

I do not know the internal workings of "dracut" but the problem seems to be within it (gut feeling).

-- Arun Khan

Just a shot in the dark... DO all the fstab entries call out md devices?

Arun Khan

23 Jun 23 Jun

4:44 a.m.

On Thu, Jun 21, 2012 at 11:40 PM, Scott Silva ssilva@sgvwater.com wrote:

...

on 6/20/2012 11:34 PM Arun Khan spake the following:

...
On Thu, Jun 21, 2012 at 10:09 AM, Rob Kampen

Just a shot in the dark... DO all the fstab entries call out md devices?

Yes, /etc/fstab contains /dev/md_d0p1 for / partition.

I have been doing some digging in the initramfs and the dracut script.

The initramfs does contain all the md related stuff like drivers, the devices for md_d0 and the /etc/mdamd.conf. To the best of my knowledge these should be sufficient to load /dev/md_d0p1 (/).

I have not had a thorough look at the dracut script though.

I will post whatever relevant information if I find something that I don't quite understand.

-- Arun Khan

Arun Khan

15 Oct 15 Oct

10:32 a.m.

__ SOLVED __

On Sat, Jun 23, 2012 at 10:14 AM, Arun Khan knura9@gmail.com wrote:

...

I have not had a thorough look at the dracut script though.

I also posted this problem on the mdadm mailing list but could not get the problem resolved.

So did some searching on the suspect candidate 'dracut'

After some more searching I found these two bugs reports: CentOS 6.2 http://bugs.centos.org/view.php?id=5400 CentOS 6.3 http://bugs.centos.org/view.php?id=5970

Using "System Rescue CD" and mounting the disk image files, I appended 'rdshell' to the kernel line in grub.conf.

With 'rdshell' one can at least do the following to get the system operational.

Booted the system with a disk failure

At the rdshell prompt:

# mdadm --run /dev/md_d0 (replace device name with your device name)

# cat /proc/mdstat (make sure your raid device is active with one member failure)

# CTRL-D (exit the rdshell)

The system will boot with md_d0 in degraded mode.

# yum update dracut (dependency dracut-kernel is pulled in)

As of this writing it is dracut-004-284.el6_3.1.noarch

# cd /boot # dracut <initramfs file name> <kernel version>

Update grub and reboot.

System boots with when either disk has failed.

-- Arun Khan

Rob Kampen

19 Jun 19 Jun

8:41 p.m.

On 06/20/2012 07:23 AM, Arun Khan wrote:

...

On Wed, Jun 20, 2012 at 12:11 AM,m.roth@5-cent.us wrote:

...
Arun Khan wrote:

...
Device Boot      Start         End      Blocks   Id  System
/dev/sda1 * 1 523 4194304 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 523 1045 4194304 83 Linux /dev/sda3 1045 1176 1048576 82 Linux swap / Solaris

raid needs Id of fd rather than 83 to auto detect??

...

...
<snip> Ok, I see that it's hardware 512b blocks, so you're not running into issues with 4k hardware blocks. I trust you installed grub on /dev/md0, which I assume is /dev/sda1 and /dev/sdb1?

From the wiki instructions, there is no re-installation of GRUB, only

a couple of changes in /boot/grub/grub.conf file installed by the regular installation on /dev/sda. During the RAID1 creation process the grub from /dev/sda would mirrored into the RAID1 device and appear on the MBR of both the disks.

As I said in the OP, I do see the grub menu with either of the disks "unplugged" i.e. missing. The kernel does boot and the white progress bar goes upto about 50% when the kernel panic occurs. I will turn off the splash and see what comes up on the console. Gut feeling -- I suspect the problem is with the initrd image created with the "dracut"

-- Arun Khan _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

m.roth＠5-cent.us

8:48 p.m.

Rob Kampen wrote:

...

On 06/20/2012 07:23 AM, Arun Khan wrote:

...
On Wed, Jun 20, 2012 at 12:11 AM,m.roth@5-cent.us wrote:

...
Arun Khan wrote:

...
Device Boot      Start         End      Blocks   Id  System
/dev/sda1 * 1 523 4194304 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 523 1045 4194304 83 Linux /dev/sda3 1045 1176 1048576 82 Linux swap / Solaris
raid needs Id of fd rather than 83 to auto detect??

Good catch. A quick google got me a page on filesystem types, which had this line: fd Linux raid partition with autodetect using persistent superblock

<snip>

mark

Arun Khan

20 Jun 20 Jun

4:35 a.m.

On Wed, Jun 20, 2012 at 2:18 AM, m.roth@5-cent.us wrote:

...

Rob Kampen wrote:

...
On 06/20/2012 07:23 AM, Arun Khan wrote:

...
On Wed, Jun 20, 2012 at 12:11 AM,m.roth@5-cent.us wrote:

...
Arun Khan wrote:

...
Device Boot Start End Blocks Id System /dev/sda1 * 1 523 4194304 83 Linux Partition 1 does not end on cylinder boundary. /dev/sda2 523 1045 4194304 83 Linux /dev/sda3 1045 1176 1048576 82 Linux swap / Solaris

raid needs Id of fd rather than 83 to auto detect??

Good catch. A quick google got me a page on filesystem types, which had this line: fd Linux raid partition with autodetect using persistent superblock

But this is supposed to be RAID1 on the *entire* disks and not on the individual partitions.

The instruction on the wiki clearly states do a "regular" install on the first disk (I did leave a few blocks at the end of the first disk as per the instructions) and then create a "partitionable" RAID1 md_d0.

http://wiki.centos.org/HowTos/Install_On_Partitionable_RAID1

..............

Why would you want to have a system installed on a partitionable software RAID1?

If you are installing a system on a partitionable RAID you can use the whole hard drive as a RAID component device, and since RAID1 is a mirror, you will be able to boot your system from any of the drives in case of failure without any additional tricks required to preserve bootloader configuration, etc. And when you need to repair a failed RAID volume with the whole hard drive as a RAID component, all you have to do is to insert a new hard drive and run mdadm --add; no partitioning or anything else required.

...........

Steps for both CentOS 5 & 6

1. Install CentOS using standard installer on the first hard disk, /dev/sda. Select manual partitioning during the installation, and leave at least 1 unit at the very end of the disk unpartitioned. You will be able to redeem most of this space back later. You need to reserve this space for mdadm which stores it's metadata at the last chunk of a raid volume.

2. Boot from the CentOS installation disk in the Rescue mode. The installer will ask you if you wish to mount an existing CentOS installation, you must refuse.

3. Build the software RAID1 using mdadm in degraded mode, with /dev/sda as the only drive: mdadm --create --metadata=0.90 --level=1 --raid-devices=2 /dev/md_d0 /dev/sda missing

4. Add the mirror drive /dev/sdb into the raid and check /proc/mdstat to see that the raid started building: mdadm --add /dev/md_d0 /dev/sdb cat /proc/mdstat ...........

</wiki quote>

-- Arun Khan

4757

Age (days ago)

4875

Last active (days ago)

discuss@lists.centos.org

15 comments

4 participants

tags (0)

participants (4)

Arun Khan
m.roth＠5-cent.us
Rob Kampen
Scott Silva