I have an 8-core SuperMicro Xeon server with CentOS 6.3. The OS is installed on a 120 GB SSD connected by SATA, the machine also contains an Areca SAS controller with 24 drives connected. The motherboard is a SuperMicro X9DA7.
When I installed the OS, I used the default options, which creates an LVM volume group to contain / and /home, and keeps /boot and /boot/efi outside the volume group.
The machine is a couple of months old, and has been stable. While installing some new hardware, I decided to also clean up the cabling in the box, since it was a bit messy. In doing this, I probably moved the boot SSD disk to another port on the motherboard (it has a bunch, 2 SATA 6GBps and 6 SATA 3GBps).
When I booted the box after this, I got a kernel panic, the typical "Can't find root device".
I read some docs, and first tried to boot from a rescue disc and reinstal GRUB, but that didn't change anything. Further Googling got me the rdshell kernel parameter, and that dropped me to a shell when it failed to find the root device.
Reading https://fedoraproject.org/wiki/How_to_debug_Dracut_problems , I did the following:
# lvm vgscan # lvm vgchange -ay
And then
# ln -s /dev/mapper/<volumegroup>-<root_volume> /dev/root # exit
After this, the box boots up normally, and everything works as it should. However, when I reboot, it again fails to find the root device.
So, after all this, my question is, how do I make Dracut (I'm assuming) understand that this LVM volume is my root device and pick it up automatically?
And, is there a way to avoid this problem in the future, if I move drives around? Surely it can't be normal for this to happen just because I connect a drive to another port?
When I booted the box after this, I got a kernel panic, the typical "Can't find root device".
Reading https://fedoraproject.org/wiki/How_to_debug_Dracut_problems , I did the following:
# lvm vgscan # lvm vgchange -ay
And then
# ln -s /dev/mapper/<volumegroup>-<root_volume> /dev/root # exit
After this, the box boots up normally, and everything works as it should. However, when I reboot, it again fails to find the root device.
So, after all this, my question is, how do I make Dracut (I'm assuming) understand that this LVM volume is my root device and pick it up automatically?
What does your kernel line in grub look like?
Barry
Grub (in the menu) has the following commands:
root (hd0,1)
kernel /vmlinuz-2.6.32-279.el6.x86_64 ro root=/dev/mapper/vg_resolve02-lv_root rd_NO_LUKS LANG=en.US.UTF-8 rd_NO_MD crashkernel=128M rd_LVM_LV=vg_resolve02/lv_root SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet pcie_aspm=off
When I successfully booted manually, I removed "rhgb quiet" and added "rdshell" to that line.
To the best of my memory, that line is stock, I don't recall ever changing it permanently.
The names of the volume group and logical volume in that line correspond to my actual root device.
Oh, and the exact Dracut error I get is:
dracut Warning: No root device "block:/dev/mapper/vg_resolve02-lv_root" found
dracut Warning: LVM vg_resolve02/lv_root not found
But then:
# lvm vgscan
Found volume group "vg_resolve02" using metadata type lvm2
# lvm vgchange -ay
1 logical volume(s) in volume group "vg_resolve02" now active
# ls /dev/mapper
control vg_resolve02-lv_root
# ln -s /dev/mapper/vg_resolve02-lv_root /dev/root
ln: creating symbolic link "/dev/root": File exists
# ls -l /dev/root
/dev/root -> dm-0
# rm /dev/root # ln -s /dev/mapper/vg_resolve02-lv_root /dev/root # exit
And everything boots normally.
Apologies if there are minor mistakes or omissions in this text. Since I can't copy/paste, I've transcribed it, excluding some parts, like the permissions of the symlink in the ls output. I have, however, double checked the important parts, like the names of devices and files.
On 03/22/2013 08:27 PM, Joakim Ziegler wrote:
So, after all this, my question is, how do I make Dracut (I'm assuming) understand that this LVM volume is my root device and pick it up automatically?
I've looked through Dracut trying to spot circumstances that might cause the problem that you've described, but came up with nothing. udev should be scanning block devices as they become available, and setting up any logical volumes on all of the available block devices.
It may be useful to capture some information in the debugging shell, before running vgscan.
As suggested in the fedora debugging document, capture the output of the following commands to get a better idea of what the kernel knows about block devices before you manually start the volumes, and maybe that'll lead us to some conclusion about why the devices aren't found.
lvm pvdisplay lvm vgdisplay lvm lvdisplay blkid dmesg
Thank you, I will do this tomorrow. It'll take me a little time, since I need to transcribe everything manually, but I'll get it done. It's just a very weird problem all in all.
On 03/24/2013 10:38 PM, Joakim Ziegler wrote:
Thank you, I will do this tomorrow. It'll take me a little time, since I need to transcribe everything manually, but I'll get it done. It's just a very weird problem all in all.
You should be able to pipe the output into a file and copy it to the actual root filesystem after vgchange & mount. Unless there's absolutely no writable space in the rescue shell?
I haven't actually tried writing anywhere in the rescue shell before vgchange and mount. I'll give it a try, that would simplify things.
On 03/24/2013 10:43 PM, Joakim Ziegler wrote:
I haven't actually tried writing anywhere in the rescue shell before vgchange and mount. I'll give it a try, that would simplify things.
If nothing else, you probably can fit most or all of that in the shell's environment. In case it's ever useful:
# debug=$(lvm pvdisplay; lvm vgdisplay; lvm lvdisplay; blkid; dmesg) # vgchange -a y # mount ... # echo "$debug" > /mnt/sysroot/root/debug
Hi, Gordon, this was indeed a very good idea. I'm attaching that file here, as it's a bit large. Thanks in advance for help and ideas.
On 25/03/13 19:26, Joakim Ziegler wrote:
Hi, Gordon, this was indeed a very good idea. I'm attaching that file here, as it's a bit large. Thanks in advance for help and ideas.
Hm, it seems the list strips attachments, and just pasting it makes the mail too big to go through, so, pastebin to the rescue:
That's the output of, like you suggested:
lvm pvdisplay; lvm vgdisplay; lvm lvdisplay; blkid; dmesg
In that order.
On 03/25/2013 06:35 PM, Joakim Ziegler wrote:
That's the output of, like you suggested:
And you ran that before you ran "vgchange -a y"? That doesn't make any sense. The commands show the volume group active. I can't see any reason why the system wouldn't boot.
I hate for you to keep rebooting your server, but do the device nodes look correct in both /dev/mapper and /dev/vg_resolve02 at that point?
Yes, I ran that immediately after getting dropped to the shell. I can take a look at the device nodes tomorrow, but if I remember correctly, /dev/mapper contained only the file "control" before running vgchange -ay, that is, there was no "vg_resolve02-lv_root" device there. That device only shows up after I run vgchange -ay.
I did not check whether /dev/vg_resolve02 exists, I can do that tomorrow.
On Tue, Mar 26, 2013 at 1:35 AM, Joakim Ziegler joakim@terminalmx.com wrote:
Yes, I ran that immediately after getting dropped to the shell. I can take a look at the device nodes tomorrow, but if I remember correctly, /dev/mapper contained only the file "control" before running vgchange -ay, that is, there was no "vg_resolve02-lv_root" device there. That device only shows up after I run vgchange -ay.
I did not check whether /dev/vg_resolve02 exists, I can do that tomorrow.
-- Joakim Ziegler - Supervisor de postproducción - Terminal joakim@terminalmx.com - 044 55 2971 8514 - 5264 0864
On 25/03/13 23:26, Gordon Messmer wrote:
On 03/25/2013 06:35 PM, Joakim Ziegler wrote:
That's the output of, like you suggested:
And you ran that before you ran "vgchange -a y"? That doesn't make any sense. The commands show the volume group active. I can't see any reason why the system wouldn't boot.
I hate for you to keep rebooting your server, but do the device nodes look correct in both /dev/mapper and /dev/vg_resolve02 at that point?
Apologies if someone mentioned this already ( don't have the whole thread in my mailbox), but whenever I've had to re-name a root lvm volume, I also had to recreate initrd. I haven't done it on 6.X, but I assume it applies to initramfs as well. The notes in my corp wiki link back to this redhat bugzilla post, https://bugzilla.redhat.com/show_bug.cgi?id=230190 try that maybe?
Patrick
I haven't actually renamed the root LVM volume, it's had the same name since install. I just moved some drives around on the SATA ports. Is it still worth recreating initrd?
On 03/26/2013 01:52 PM, Joakim Ziegler wrote:
I haven't actually renamed the root LVM volume, it's had the same name since install. I just moved some drives around on the SATA ports. Is it still worth recreating initrd?
I wouldn't expect it to make a difference, but it probably wouldn't hurt anything. Copy or rename your existing initrd to a path in /boot, so that you can revert if anything goes wrong. After that, create a new one. If that fixes the problem, I'd be curious to know why. We can compare the content of the two if that changes anything, and I'll learn something. As far as I know, the path to the devices isn't included in the initrd.
# mkinitrd /boot/initramfs-$(uname -r).img $(uname -r)
Thanks, will try.
Recreating initrd made no difference.
Immediately after getting dropped to rdshell, I looked around in /dev, which brought me a few surprises...
/dev/mapper contains only "control", that is, "vg_resolve02-lv_root" is missing.
/dev/root is a symlink to /dev/dm-0
Which is a bit surprising, since, when I do lvm vgscan and lvm vgchange -ya, /dev/mapper/vg_resolve02-lv_root appears, but I just now noticed that it's a symlink to /dev/dm-0, so, in effect, when I symlink /dev/root to /dev/vg_resolve02-lv_root, I'm just creating the same symlink that was already there, with one more level of redirection.
That means /dev/root already is correct, so the only thing I'm actually changing to make the system boot is to scan for volume groups and activate them.
The big question then becomes: Why do I have to do this manually? How do I make Dracut (I assume this is Dracut's job) make this automatically?
On 03/29/2013 01:23 AM, Joakim Ziegler wrote:
Immediately after getting dropped to rdshell, I looked around in /dev, which brought me a few surprises...
/dev/mapper contains only "control", that is, "vg_resolve02-lv_root" is missing.
Did you get to look at or for /dev/vg_resolve02 as well?
/dev/root is a symlink to /dev/dm-0
Does /dev/dm-0 exist?
Does the system boot if you just "exit" from the rdshell? What about if you "vgchange -a y" without changing the symlink?
That means /dev/root already is correct, so the only thing I'm actually changing to make the system boot is to scan for volume groups and activate them.
The big question then becomes: Why do I have to do this manually? How do I make Dracut (I assume this is Dracut's job) make this automatically?
udev should be doing this. And... I was just looking at this again, because the last time I came up with nothing useful. Look at /usr/share/dracut/modules.d/90lvm/64-lvm.rules. If I'm reading this correctly, udev will look for dm-0 in /sys and will not run lvm_scan if it's found. I wonder if it's possible that the /sys nodes are getting set up, but device-mapper isn't setting up the nodes in /dev?
I'm really at a loss... it seems like a much simpler explanation is simply that the devices take so long to detect that init gives up. When you run vgchange, they've had the time they need. That idea is inconsistent with the fact that your dmesg output shows what I assume is the correct devices and partition tables.
You could try adding "rdinitdebug rdudevdebug" to your kernel command line, but you're going to see a LOT of output, and it's only really going to be meaningful if you've read the /init script that Dracut creates, and understand more or less what it's doing, particularly in the "main_loop" section.
On 29/03/13 10:38, Gordon Messmer wrote:
On 03/29/2013 01:23 AM, Joakim Ziegler wrote:
Immediately after getting dropped to rdshell, I looked around in /dev, which brought me a few surprises...
/dev/mapper contains only "control", that is, "vg_resolve02-lv_root" is missing.
Did you get to look at or for /dev/vg_resolve02 as well?
/dev/root is a symlink to /dev/dm-0
Does /dev/dm-0 exist?
Does the system boot if you just "exit" from the rdshell? What about if you "vgchange -a y" without changing the symlink?
I checked this a bit more thoroughly. The status is as follows:
When I boot up and get dropped to rdshell, neither /dev/root nor /dev/vg_resolve02, nor /dev/dm-0 exist. Just exiting at this point drops me back into rdshell. Waiting a few minutes makes no difference.
Doing lvm vgscan finds the volume group, but creates no device nodes. Just exiting at this point drops me back into rdshell as well.
When I do lvm vgchange -ay, /dev/dm-0 is created, /dev/root is created as a symlink to it, as well as /dev/vg_resolve02/ with lv_root inside it, and /dev/mapper/vg_resolve02-lv_root. I don't need to change the symlink or do anything else, if I exit after doing lvm vgchange -ay, everything is ok.
That means /dev/root already is correct, so the only thing I'm actually changing to make the system boot is to scan for volume groups and activate them.
The big question then becomes: Why do I have to do this manually? How do I make Dracut (I assume this is Dracut's job) make this automatically?
udev should be doing this. And... I was just looking at this again, because the last time I came up with nothing useful. Look at /usr/share/dracut/modules.d/90lvm/64-lvm.rules. If I'm reading this correctly, udev will look for dm-0 in /sys and will not run lvm_scan if it's found. I wonder if it's possible that the /sys nodes are getting set up, but device-mapper isn't setting up the nodes in /dev?
It turns out I was wrong about dm-0 already existing, it's created on vgchange -ay. I'm looking at the file you mention, but I'm afraid I don't know LVM well enough to make that much sense of it. From what I can tell, it calls lvm_scan for each device, and there's an lvm_scan.sh in there that looks like it should be doing lvchange -ay, but if dm-0 doesn't already exist, I don't think this will do anything, am I wrong?
I'm really at a loss... it seems like a much simpler explanation is simply that the devices take so long to detect that init gives up. When you run vgchange, they've had the time they need. That idea is inconsistent with the fact that your dmesg output shows what I assume is the correct devices and partition tables.
You could try adding "rdinitdebug rdudevdebug" to your kernel command line, but you're going to see a LOT of output, and it's only really going to be meaningful if you've read the /init script that Dracut creates, and understand more or less what it's doing, particularly in the "main_loop" section.
I can try this, but it might be a bit beyond my area of expertise, I'm afraid.
If I were to just try a brute force approach, what RPM packages should I reinstall/update to get all this stuff reinstalled as it was the first time I installed the system?
On 30/03/13 7:18, Joakim Ziegler wrote:
On 29/03/13 10:38, Gordon Messmer wrote:
On 03/29/2013 01:23 AM, Joakim Ziegler wrote:
Immediately after getting dropped to rdshell, I looked around in /dev, which brought me a few surprises...
/dev/mapper contains only "control", that is, "vg_resolve02-lv_root" is missing.
Did you get to look at or for /dev/vg_resolve02 as well?
/dev/root is a symlink to /dev/dm-0
Does /dev/dm-0 exist?
Does the system boot if you just "exit" from the rdshell? What about if you "vgchange -a y" without changing the symlink?
I checked this a bit more thoroughly. The status is as follows:
When I boot up and get dropped to rdshell, neither /dev/root nor /dev/vg_resolve02, nor /dev/dm-0 exist. Just exiting at this point drops me back into rdshell. Waiting a few minutes makes no difference.
Doing lvm vgscan finds the volume group, but creates no device nodes. Just exiting at this point drops me back into rdshell as well.
When I do lvm vgchange -ay, /dev/dm-0 is created, /dev/root is created as a symlink to it, as well as /dev/vg_resolve02/ with lv_root inside it, and /dev/mapper/vg_resolve02-lv_root. I don't need to change the symlink or do anything else, if I exit after doing lvm vgchange -ay, everything is ok.
That means /dev/root already is correct, so the only thing I'm actually changing to make the system boot is to scan for volume groups and activate them.
The big question then becomes: Why do I have to do this manually? How do I make Dracut (I assume this is Dracut's job) make this automatically?
udev should be doing this. And... I was just looking at this again, because the last time I came up with nothing useful. Look at /usr/share/dracut/modules.d/90lvm/64-lvm.rules. If I'm reading this correctly, udev will look for dm-0 in /sys and will not run lvm_scan if it's found. I wonder if it's possible that the /sys nodes are getting set up, but device-mapper isn't setting up the nodes in /dev?
It turns out I was wrong about dm-0 already existing, it's created on vgchange -ay. I'm looking at the file you mention, but I'm afraid I don't know LVM well enough to make that much sense of it. From what I can tell, it calls lvm_scan for each device, and there's an lvm_scan.sh in there that looks like it should be doing lvchange -ay, but if dm-0 doesn't already exist, I don't think this will do anything, am I wrong?
I'm really at a loss... it seems like a much simpler explanation is simply that the devices take so long to detect that init gives up. When you run vgchange, they've had the time they need. That idea is inconsistent with the fact that your dmesg output shows what I assume is the correct devices and partition tables.
You could try adding "rdinitdebug rdudevdebug" to your kernel command line, but you're going to see a LOT of output, and it's only really going to be meaningful if you've read the /init script that Dracut creates, and understand more or less what it's doing, particularly in the "main_loop" section.
I can try this, but it might be a bit beyond my area of expertise, I'm afraid.
If I were to just try a brute force approach, what RPM packages should I reinstall/update to get all this stuff reinstalled as it was the first time I installed the system?
Just bumping this up, any ideas about this? It's a little annoying not having this box boot by itself...
On 01/04/13 16:53, Joakim Ziegler wrote:
On 30/03/13 7:18, Joakim Ziegler wrote:
On 29/03/13 10:38, Gordon Messmer wrote:
On 03/29/2013 01:23 AM, Joakim Ziegler wrote:
Immediately after getting dropped to rdshell, I looked around in /dev, which brought me a few surprises...
/dev/mapper contains only "control", that is, "vg_resolve02-lv_root" is missing.
Did you get to look at or for /dev/vg_resolve02 as well?
/dev/root is a symlink to /dev/dm-0
Does /dev/dm-0 exist?
Does the system boot if you just "exit" from the rdshell? What about if you "vgchange -a y" without changing the symlink?
I checked this a bit more thoroughly. The status is as follows:
When I boot up and get dropped to rdshell, neither /dev/root nor /dev/vg_resolve02, nor /dev/dm-0 exist. Just exiting at this point drops me back into rdshell. Waiting a few minutes makes no difference.
Doing lvm vgscan finds the volume group, but creates no device nodes. Just exiting at this point drops me back into rdshell as well.
When I do lvm vgchange -ay, /dev/dm-0 is created, /dev/root is created as a symlink to it, as well as /dev/vg_resolve02/ with lv_root inside it, and /dev/mapper/vg_resolve02-lv_root. I don't need to change the symlink or do anything else, if I exit after doing lvm vgchange -ay, everything is ok.
That means /dev/root already is correct, so the only thing I'm actually changing to make the system boot is to scan for volume groups and activate them.
The big question then becomes: Why do I have to do this manually? How do I make Dracut (I assume this is Dracut's job) make this automatically?
udev should be doing this. And... I was just looking at this again, because the last time I came up with nothing useful. Look at /usr/share/dracut/modules.d/90lvm/64-lvm.rules. If I'm reading this correctly, udev will look for dm-0 in /sys and will not run lvm_scan if it's found. I wonder if it's possible that the /sys nodes are getting set up, but device-mapper isn't setting up the nodes in /dev?
It turns out I was wrong about dm-0 already existing, it's created on vgchange -ay. I'm looking at the file you mention, but I'm afraid I don't know LVM well enough to make that much sense of it. From what I can tell, it calls lvm_scan for each device, and there's an lvm_scan.sh in there that looks like it should be doing lvchange -ay, but if dm-0 doesn't already exist, I don't think this will do anything, am I wrong?
I'm really at a loss... it seems like a much simpler explanation is simply that the devices take so long to detect that init gives up. When you run vgchange, they've had the time they need. That idea is inconsistent with the fact that your dmesg output shows what I assume is the correct devices and partition tables.
You could try adding "rdinitdebug rdudevdebug" to your kernel command line, but you're going to see a LOT of output, and it's only really going to be meaningful if you've read the /init script that Dracut creates, and understand more or less what it's doing, particularly in the "main_loop" section.
I can try this, but it might be a bit beyond my area of expertise, I'm afraid.
If I were to just try a brute force approach, what RPM packages should I reinstall/update to get all this stuff reinstalled as it was the first time I installed the system?
Just bumping this up, any ideas about this? It's a little annoying not having this box boot by itself...
And bumping this again... Any ideas? Anyone?