On Sun, 10 Mar 2013 12:14:14 +0100 Reindl Harald wrote:
use "screen" if you update over WAN connections yes, i know it is too late but thats the way to go
I was doing it through VNC, thinking that would be more-or-less equivalent to screen, which it apparently isn't. Somehow my vnc session (desktop) just disappeared in the middle of the job, while I was running "yum update" on the remote host machine and two other computers. Perhaps the "yum update" that was running on the remote host machine killed VNC -- in hindsight perhaps I shouldn't have done that.
My google searching leads me to suspect that initramfs may be missing on those computers. If that is the case (which I will verify later this afternoon) then I'm thinking that perhaps chrooting to the hard drive followed by a simple yum remove kernel-2.6.32-358.0.1 and yum install kernel-2.6.32-358.0.1 will fix it.
It's funny that all three of them died in the same way, though I guess they were all at about the same stage in the update process when my VNC session disappeared.
Running "yum-complete-transaction", followed by "package-cleanup --cleandupes", followed by "yum update" seems to have put everything back the way that it should be, with the exception of whatever it is that prevents the machine from booting.
On 03/10/2013 12:12 PM, Frank Cox wrote:
On Sun, 10 Mar 2013 12:14:14 +0100 Reindl Harald wrote:
use "screen" if you update over WAN connections yes, i know it is too late but thats the way to go
I was doing it through VNC, thinking that would be more-or-less equivalent to screen, which it apparently isn't. Somehow my vnc session (desktop) just disappeared in the middle of the job, while I was running "yum update" on the remote host machine and two other computers. Perhaps the "yum update" that was running on the remote host machine killed VNC -- in hindsight perhaps I shouldn't have done that.
What most likely happened:
The "yum update" that was running in your lost VNC session was in all likelihood still running.
If you had done a 'ps -ef | grep yum' you would probably have seen that yum update was still running.
And then it looks like you logged back in to a new session and began running other yum commands before the original "yum update" had completed.
So now you have a mess that may not be easy to untangle.
It may be easier to restore from backup and then attempt to do the update again.
My google searching leads me to suspect that initramfs may be missing on those computers. If that is the case (which I will verify later this afternoon) then I'm thinking that perhaps chrooting to the hard drive followed by a simple yum remove kernel-2.6.32-358.0.1 and yum install kernel-2.6.32-358.0.1 will fix it.
It's funny that all three of them died in the same way, though I guess they were all at about the same stage in the update process when my VNC session disappeared.
Running "yum-complete-transaction", followed by "package-cleanup --cleandupes", followed by "yum update" seems to have put everything back the way that it should be, with the exception of whatever it is that prevents the machine from booting.
On Sun, 10 Mar 2013 12:26:51 -0400 Gerry Reno wrote:
The "yum update" that was running in your lost VNC session was in all likelihood still running.
If yum was indeed still running, it wasn't using any significant CPU. I did run top in my login terminal to see if anything significant was going on and yum didn't show up on the list.
When I attempted to re-connect to vncserver after that, I was told "connection refused", and "service vncserver start" cranked up another session for me without any errors.
I think vncserver just altogether crashed for some reason, probably related to the yum update that I was running on that machine at the time. I suppose the lesson learned here is to always update the host machine from a screen session running in a plain terminal, not through a vnc session.
It may be easier to restore from backup and then attempt to do the update again.
Perhaps, but since everything seems to still be in place on those hard drives, and since my last "yum update" completed without any errors being reported, I suspect (hope?) that everything is still ok with the exception of whatever is causing the machines to fail to boot.
On 03/10/2013 01:04 PM, Frank Cox wrote:
On Sun, 10 Mar 2013 12:26:51 -0400 Gerry Reno wrote:
The "yum update" that was running in your lost VNC session was in all likelihood still running.
If yum was indeed still running, it wasn't using any significant CPU. I did run top in my login terminal to see if anything significant was going on and yum didn't show up on the list.
When I attempted to re-connect to vncserver after that, I was told "connection refused", and "service vncserver start" cranked up another session for me without any errors.
I think vncserver just altogether crashed for some reason, probably related to the yum update that I was running on that machine at the time. I suppose the lesson learned here is to always update the host machine from a screen session running in a plain terminal, not through a vnc session.
The reason I said yum update was still running was because I've had this exact scenario occur before.
VNC died during yum update and when I got back in I could see that yum update was still running.
I just waited until it finished.
It may be easier to restore from backup and then attempt to do the update again.
Perhaps, but since everything seems to still be in place on those hard drives, and since my last "yum update" completed without any errors being reported, I suspect (hope?) that everything is still ok with the exception of whatever is causing the machines to fail to boot.
I hope it is only your initramfs. If that isn't it, for me I would just restore and rerun the update. Much less time involved.
On Sun, 10 Mar 2013 11:04:37 -0600 Frank Cox wrote:
It may be easier to restore from backup and then attempt to do the update again.
Perhaps, but since everything seems to still be in place on those hard drives, and since my last "yum update" completed without any errors being reported, I suspect (hope?) that everything is still ok with the exception of whatever is causing the machines to fail to boot.
It's looking more and more like a full nuke-and-pave is going to be the answer here.
As I suspected, initramfs-2.6.32-358.0.1 was missing in /boot. Unfortunately, none of the other installed kernels boot either -- everything gives me a kernel panic.
I did a yum remove kernel-2.6.32-358.0.1 and yum install kernel-2.6.32-358.0.1 and the whole transaction appeared to be successful.
That got me initramfs-2.6.32-358.0.1 back in /boot, but I still get a kernel panic when I reboot the machine. The initial rhgb screen comes up and the little circle thing cranks for a minute or so, but then I get "kernel panic: attempted to kill init!". Booting without rhgb gives me a cursor in the top left corner for a minute, followed by "kernel panic: attemtped to kill init!". The last time /var/log/boot.log was written to was the last time the machine was rebooted prior to this whole episode (i.e. a few weeks ago) so there is absolutely no error message or log information available other than the kernel panic message on the screen.
Damn, I hate the idea of having to set all of these machines up again from scratch. Two of them aren't much to re-do, but the third one is the office workhorse machine that does everything from dhcp server to nfs server to print server to you-name-it.
On 03/10/2013 07:00 PM, Frank Cox wrote:
On Sun, 10 Mar 2013 11:04:37 -0600 Frank Cox wrote:
It may be easier to restore from backup and then attempt to do the update again.
Perhaps, but since everything seems to still be in place on those hard drives, and since my last "yum update" completed without any errors being reported, I suspect (hope?) that everything is still ok with the exception of whatever is causing the machines to fail to boot.
It's looking more and more like a full nuke-and-pave is going to be the answer here.
As I suspected, initramfs-2.6.32-358.0.1 was missing in /boot. Unfortunately, none of the other installed kernels boot either -- everything gives me a kernel panic.
I did a yum remove kernel-2.6.32-358.0.1 and yum install kernel-2.6.32-358.0.1 and the whole transaction appeared to be successful.
That got me initramfs-2.6.32-358.0.1 back in /boot, but I still get a kernel panic when I reboot the machine. The initial rhgb screen comes up and the little circle thing cranks for a minute or so, but then I get "kernel panic: attempted to kill init!". Booting without rhgb gives me a cursor in the top left corner for a minute, followed by "kernel panic: attemtped to kill init!". The last time /var/log/boot.log was written to was the last time the machine was rebooted prior to this whole episode (i.e. a few weeks ago) so there is absolutely no error message or log information available other than the kernel panic message on the screen.
Damn, I hate the idea of having to set all of these machines up again from scratch. Two of them aren't much to re-do, but the third one is the office workhorse machine that does everything from dhcp server to nfs server to print server to you-name-it.
Did you try booting a rescue disk and reinstalling the bootloader?
On 03/10/2013 07:29 PM, Gerry Reno wrote:
On 03/10/2013 07:00 PM, Frank Cox wrote:
On Sun, 10 Mar 2013 11:04:37 -0600 Frank Cox wrote:
It may be easier to restore from backup and then attempt to do the update again.
Perhaps, but since everything seems to still be in place on those hard drives, and since my last "yum update" completed without any errors being reported, I suspect (hope?) that everything is still ok with the exception of whatever is causing the machines to fail to boot.
It's looking more and more like a full nuke-and-pave is going to be the answer here.
As I suspected, initramfs-2.6.32-358.0.1 was missing in /boot. Unfortunately, none of the other installed kernels boot either -- everything gives me a kernel panic.
I did a yum remove kernel-2.6.32-358.0.1 and yum install kernel-2.6.32-358.0.1 and the whole transaction appeared to be successful.
That got me initramfs-2.6.32-358.0.1 back in /boot, but I still get a kernel panic when I reboot the machine. The initial rhgb screen comes up and the little circle thing cranks for a minute or so, but then I get "kernel panic: attempted to kill init!". Booting without rhgb gives me a cursor in the top left corner for a minute, followed by "kernel panic: attemtped to kill init!". The last time /var/log/boot.log was written to was the last time the machine was rebooted prior to this whole episode (i.e. a few weeks ago) so there is absolutely no error message or log information available other than the kernel panic message on the screen.
Damn, I hate the idea of having to set all of these machines up again from scratch. Two of them aren't much to re-do, but the third one is the office workhorse machine that does everything from dhcp server to nfs server to print server to you-name-it.
Did you try booting a rescue disk and reinstalling the bootloader?
If you have a good full backup just reinstall the base OS and overlay your backup.
On Sun, 10 Mar 2013 19:29:56 -0400 Gerry Reno wrote:
Did you try booting a rescue disk and reinstalling the bootloader?
I booted the "Centos 6.4 minimal iso", told it to "upgrade an existing installation", and to install the bootloader. About all that it appeared to do was install the bootloader. Unfortunately, the machine still didn't boot.
The bootloader seems to be fine -- grub itself boots up. I get a kernel panic after that, when you normally see the messages about unpacking vmlinuz and so on. I just get a blank black screen with a flashing cursor (or the rhgb screen with the spinning doodad, depending on the grub setting) and then a kernel panic.
On Sun, 10 Mar 2013 17:40:39 -0600 Frank Cox wrote:
The bootloader seems to be fine -- grub itself boots up. I get a kernel panic after that,
I just had a thought: Is it possible to just reformat and reinstall the /boot partition? I wonder if that would solve the problem....
On 03/10/2013 07:40 PM, Frank Cox wrote:
On Sun, 10 Mar 2013 19:29:56 -0400 Gerry Reno wrote:
Did you try booting a rescue disk and reinstalling the bootloader?
I booted the "Centos 6.4 minimal iso", told it to "upgrade an existing installation", and to install the bootloader. About all that it appeared to do was install the bootloader. Unfortunately, the machine still didn't boot.
The bootloader seems to be fine -- grub itself boots up. I get a kernel panic after that, when you normally see the messages about unpacking vmlinuz and so on. I just get a blank black screen with a flashing cursor (or the rhgb screen with the spinning doodad, depending on the grub setting) and then a kernel panic.
It seems like maybe it cannot find the root filesystem.
Kernel panics just like this when it cannot find it.
.
On Sun, 10 Mar 2013 20:24:55 -0400 Gerry Reno wrote:
It seems like maybe it cannot find the root filesystem.
Kernel panics just like this when it cannot find it.
Interesting. How can I check that? I have another almost-identical system that's still working and I compared grub.conf between the two of them and didn't notice any significant differences. Nothing that immediately jumped up and down and screamed "problem here!" at least.
What should I be looking for?
On 03/10/2013 11:09 PM, Frank Cox wrote:
On Sun, 10 Mar 2013 20:24:55 -0400 Gerry Reno wrote:
It seems like maybe it cannot find the root filesystem.
Kernel panics just like this when it cannot find it.
Interesting. How can I check that? I have another almost-identical system that's still working and I compared grub.conf between the two of them and didn't notice any significant differences. Nothing that immediately jumped up and down and screamed "problem here!" at least.
What should I be looking for?
Boot to rescue mode and see if you can mount the device containing the root filesystem readonly and see all the files on it.
Then check that the kernel root option is looking at the same device.
On Sun, 10 Mar 2013 23:16:10 -0400 Gerry Reno wrote:
Boot to rescue mode and see if you can mount the device containing the root filesystem readonly and see all the files on it.
Then check that the kernel root option is looking at the same device.
I can indeed see all of the files on that computer, including the boot directory and everything under /
I don't know what to do from that point, though.
Here is the grub.conf from the working system, which is pretty much identical to one of the non-working systems. I assume that you mean I need to do something to change and/or fix the root= portion of the kernel commandline, but how do I find out what to change it to?
default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.32-358.0.1.el6.i686) root (hd0,0) kernel /vmlinuz-2.6.32-358.0.1.el6.i686 ro root=/dev/mapper/vg_ws195-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD quiet SYSFONT=latarcyrheb-sun16 rd_LVM_LV=vg_ws195/lv_swap rhgb crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_LVM_LV=vg_ws195/lv_root rd_NO_DM initrd /initramfs-2.6.32-358.0.1.el6.i686.img
On 03/10/2013 11:23 PM, Frank Cox wrote:
On Sun, 10 Mar 2013 23:16:10 -0400 Gerry Reno wrote:
Boot to rescue mode and see if you can mount the device containing the root filesystem readonly and see all the files on it.
Then check that the kernel root option is looking at the same device.
I can indeed see all of the files on that computer, including the boot directory and everything under /
I don't know what to do from that point, though.
Here is the grub.conf from the working system, which is pretty much identical to one of the non-working systems. I assume that you mean I need to do something to change and/or fix the root= portion of the kernel commandline, but how do I find out what to change it to?
default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.32-358.0.1.el6.i686) root (hd0,0) kernel /vmlinuz-2.6.32-358.0.1.el6.i686 ro root=/dev/mapper/vg_ws195-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD quiet SYSFONT=latarcyrheb-sun16 rd_LVM_LV=vg_ws195/lv_swap rhgb crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_LVM_LV=vg_ws195/lv_root rd_NO_DM initrd /initramfs-2.6.32-358.0.1.el6.i686.img
Do you know if this grub file was rewritten?
Can you check it against a backup copy?
Other than that I've given you my best suggestions.
.
On Sun, 10 Mar 2013 23:27:25 -0400 Gerry Reno wrote:
Do you know if this grub file was rewritten?
Can you check it against a backup copy?
I don't have a backup copy of the grub.conf file since it's always been automatically managed and updated by grub and friends and I've never really had to pay much attention to it.
I did compare it between the non-working and the working machines and didn't see anything that struck me as a significant difference.
The most maddening part of this is that all of the files and the filesystems appear to be present -- I can boot off of a rescue CD and mount the whole works under /mnt/sysimage and browse to my hearts content. I just can't boot the damn thing.
How is a name like /dev/mapper/vg_ws195-lv_root rd_NO_LUKS determined? If I knew how to read or find out what the actual name of the root directory was on the problem machines, I could compare it to what's in the grub.conf file.
On Sun, Mar 10, 2013 at 10:45 PM, Frank Cox theatre@melvilletheatre.com wrote:
The most maddening part of this is that all of the files and the filesystems appear to be present -- I can boot off of a rescue CD and mount the whole works under /mnt/sysimage and browse to my hearts content. I just can't boot the damn thing.
How is a name like /dev/mapper/vg_ws195-lv_root rd_NO_LUKS determined? If I knew how to read or find out what the actual name of the root directory was on the problem machines, I could compare it to what's in the grub.conf file.
I don't have any idea how to debug LVM stuff. But if you can boot in rescue mode just on general principles I would chroot into /mnt/sysimage, rebuild the initrd and reinstall grub.
Les Mikesell wrote:
On Sun, Mar 10, 2013 at 10:45 PM, Frank Cox theatre@melvilletheatre.com wrote:
The most maddening part of this is that all of the files and the filesystems appear to be present -- I can boot off of a rescue CD and
mount the
whole works under /mnt/sysimage and browse to my hearts content. I
just can't boot
the damn thing.
How is a name like /dev/mapper/vg_ws195-lv_root rd_NO_LUKS determined? If I knew how to read or find out what the actual name of the root
directory
was on the problem machines, I could compare it to what's in the grub.conf file.
I don't have any idea how to debug LVM stuff. But if you can boot in rescue mode just on general principles I would chroot into /mnt/sysimage, rebuild the initrd and reinstall grub.
rd_NO_LUKS says that there are no encrypted filesystems. We *strongly* prefer to label our filesystems.
Finally, if you can see it running via linux rescue, I'd go with Les' thought: boot that way, chroot to /mnt/sysimage, and first do a grub-install. If that doesn't solve it, then try the rebuild of initrd.
Oh, and check /mnt/sysimage/etc/fstab
mark
On Mon, Mar 11, 2013 at 1:22 PM, m.roth@5-cent.us wrote:
Finally, if you can see it running via linux rescue, I'd go with Les' thought: boot that way, chroot to /mnt/sysimage, and first do a grub-install. If that doesn't solve it, then try the rebuild of initrd.
Is there a simple way to tell yum to re-install the current kernel? If you can do that from the rescue chroot the rpm scripts should rebuild the initrd for you - and maybe that step was interrupted in the earlier update attempt.
Les Mikesell wrote:
On Mon, Mar 11, 2013 at 1:22 PM, m.roth@5-cent.us wrote:
Finally, if you can see it running via linux rescue, I'd go with Les' thought: boot that way, chroot to /mnt/sysimage, and first do a grub-install. If that doesn't solve it, then try the rebuild of initrd.
Is there a simple way to tell yum to re-install the current kernel? If you can do that from the rescue chroot the rpm scripts should rebuild the initrd for you - and maybe that step was interrupted in the earlier update attempt.
Won't yum reinstall kernel work?
mark
On Sunday 10 March 2013, Frank Cox theatre@melvilletheatre.com wrote:
Interesting. How can I check that? I have another almost-identical system that's still working and I compared grub.conf between the two of them and didn't notice any significant differences. Nothing that immediately jumped up and down and screamed "problem here!" at
Also check that /etc/fstab is correct.
On Sun, 10 Mar 2013 23:36:55 -0400 Yves Bellefeuille wrote:
Also check that /etc/fstab is correct.
I've finally figured out how to get an error message, but I have no idea of how to fix it.
By removing "rhgb" and "quiet" from the grub commandline, I see a whole bunch of write-up going by, then there is a several-second pause, and then this:
dracut warning: no root device "block:/dev/mapper/vg_ws194-lv_root" found
After that I get the kernel panic message and that's the end of the line.
By booting the machine from the rescue cd, I can find /mnt/sysimage/dev/mapper/vg_ws194-lv_root, which is a symbolic link to ../dm-0
/mnt/sysimage/dev/dm-0 exists.
lvscan tells me this: ACTIVE '/dev/vg_ws194/lv_root' [50.00 GiB] inherit
lvdisplay shows a bunch of stats about /dev/vg_ws194/lv_root that look normal to me.
In summary, it looks very much like vg_ws194-lv_root is indeed present and accounted for, but dracut didn't find it for some reason.
On Sun, 10 Mar 2013 23:57:30 -0600 Frank Cox wrote:
dracut warning: no root device "block:/dev/mapper/vg_ws194-lv_root" found
After that I get the kernel panic message and that's the end of the line.
Following the instructions here:
https://ask.fedoraproject.org/question/10041/how-to-repair-unbootable-fedora...
I did this:
mv /boot/initramfs-2.6.32-358.0.1.el6.i386.img /boot/initramfs-2.6.32-358.0.1.el6.i386-nouveau.img dracut /boot/initramfs-2.6.32-358.0.1.el6.i386.img 2.6.32-358.0.1.el6.i386
Interestingly enough, the new initramfs that I got from this command is slightly smaller than the one that I already had in /boot.
Sadly, this made no difference. When I booted the machine, I still got the same dracut warning and kernel panic.
mv /boot/initramfs-2.6.32-358.0.1.el6.i386.img /boot/initramfs-2.6.32-358.0.1.el6.i386-nouveau.img dracut /boot/initramfs-2.6.32-358.0.1.el6.i386.img 2.6.32-358.0.1.el6.i386
Interestingly enough, the new initramfs that I got from this command is slightly smaller than the one that I already had in /boot.
Sadly, this made no difference. When I booted the machine, I still got the same dracut warning and kernel panic.
To me looks like the initramfs does not contain all the needed "pieces" to boot the machine.
Try investigating the dracut options to include more modules or filesystem etc. starting with --lvmconf and --mdadmconf