Earlier this evening, I had the interesting experience of shutting down my machine (because of that lp out-of-sync problem, discussed elsewhere), and watching it not come back up. I admit that I changed out one of my DVD writers for another one, but I don't understand how that could have had any effect on this:
When I restarted the machine, it came up to the point where it normally shows: "GRUB Loading Stage 2..."
Except, it stopped at the "GRUB" and hung - no stage 2, no disk activity, nothing, just hung.
One of the changes I made a little while ago was to modify the boot order of my drives. I had been booting off of the hda drive, but now I'm booting from the sda drive. I went through the "grub-install /dev/sda" back then, then fixed the grub.conf because the disk id for the root was wrong (it was showing hd2, but in order to boot, this had to be set to hd0 because grub couldn't find anything on hd2 - weird, 'cuz that's where the /boot and / partitions both live, but I digress).
I thought I'd be able to recover from this by booting from hda again, but that didn't work. So I put in my Live CD, booted from that and tried to fix it via grub-install from there - no go. Grub couldn't find /dev/mapper/livecd-rw (which wasn't there, although /dev/mapper/live-rw was - is that a bug in the live cd?)
Then I went to my installation DVD, booted up into linux rescue mode (which took a few tries because I forgot to boot with linux rescue noapic), chroot'd to the right /dev and THEN I could finally run grub-install, and finally, the system rebooted from /dev/sda.
I'm wondering:
1) What would cause the system not to get to the "Loading stage 2" part but be able to load GRUB? This one is really the key - if I can understand this, I can prevent it from happening again.
2) What's with the live CD not being able to run grub-install properly? (It also would not install using grub - I don't recall the exact error message, but it was something about there not being a stage 1 area on the drive, which makes no sense because it _does_ boot from that drive).
3) Is this the only way to recover from such an error (GRUB, no stage 2, not the lp error), or was there an easier way I missed?
My configuration is:
OS : CentOS 5.2 (2.6.18-92.1.10 Linux kernel) x86_64 with latest updates
Hardware: AMD Athlon 64 x2 4200+ (2 x 2.0GHz), ECS NFORCE4M-A 4GB OCZ DDR2 800MHz (PC6400) /dev/hda Maxtor 160MB PATA UDMA-133 /dev/hdb Maxtor 120MB PATA UDMA-133 /dev/sda Seagate 300GB SATA-150/300 /dev/sdb WD 320GB SATA-150/300 /dev/hdc Pioneer 1810 (DVR-112D) 18x DVD+/-RW/DL /dev/hdd Pioneer 1810 (DVR-112D) 18x DVD+/-RW/DL
Thanks.
mhr
on 9-22-2008 1:53 AM Mark Hull-Richter spake the following:
Earlier this evening, I had the interesting experience of shutting down my machine (because of that lp out-of-sync problem, discussed elsewhere), and watching it not come back up. I admit that I changed out one of my DVD writers for another one, but I don't understand how that could have had any effect on this:
When I restarted the machine, it came up to the point where it normally shows: "GRUB Loading Stage 2..."
Except, it stopped at the "GRUB" and hung - no stage 2, no disk activity, nothing, just hung.
One of the changes I made a little while ago was to modify the boot order of my drives. I had been booting off of the hda drive, but now I'm booting from the sda drive. I went through the "grub-install /dev/sda" back then, then fixed the grub.conf because the disk id for the root was wrong (it was showing hd2, but in order to boot, this had to be set to hd0 because grub couldn't find anything on hd2 - weird, 'cuz that's where the /boot and / partitions both live, but I digress).
I thought I'd be able to recover from this by booting from hda again, but that didn't work. So I put in my Live CD, booted from that and tried to fix it via grub-install from there - no go. Grub couldn't find /dev/mapper/livecd-rw (which wasn't there, although /dev/mapper/live-rw was - is that a bug in the live cd?)
Then I went to my installation DVD, booted up into linux rescue mode (which took a few tries because I forgot to boot with linux rescue noapic), chroot'd to the right /dev and THEN I could finally run grub-install, and finally, the system rebooted from /dev/sda.
I'm wondering:
- What would cause the system not to get to the "Loading stage 2" part
but be able to load GRUB? This one is really the key - if I can understand this, I can prevent it from happening again.
Grub boot code is in the mbr so it loads. If it can't find the stage 2 it usually quietly dies. I believe it has to load stage 1 to have enough code to actually give error messages. The mbr is just too small to get all the code into. So changing drives also changed the bios disk order on your system, and grub got confused.
- What's with the live CD not being able to run grub-install properly?
(It also would not install using grub - I don't recall the exact error message, but it was something about there not being a stage 1 area on the drive, which makes no sense because it _does_ boot from that drive).
The live CD isn't really a proper rescue disk for grub, but it is adequate to copy/move data off. I wish it had the rescue cd code from an install disk at least as a boot option.
- Is this the only way to recover from such an error (GRUB, no stage 2,
not the lp error), or was there an easier way I missed?
My configuration is:
OS : CentOS 5.2 (2.6.18-92.1.10 Linux kernel) x86_64 with latest updates
Hardware: AMD Athlon 64 x2 4200+ (2 x 2.0GHz), ECS NFORCE4M-A 4GB OCZ DDR2 800MHz (PC6400) /dev/hda Maxtor 160MB PATA UDMA-133 /dev/hdb Maxtor 120MB PATA UDMA-133 /dev/sda Seagate 300GB SATA-150/300 /dev/sdb WD 320GB SATA-150/300 /dev/hdc Pioneer 1810 (DVR-112D) 18x DVD+/-RW/DL /dev/hdd Pioneer 1810 (DVR-112D) 18x DVD+/-RW/DL
Thanks.
mhr
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Mon, Sep 22, 2008 at 10:08 AM, Scott Silva ssilva@sgvwater.com wrote:
Grub boot code is in the mbr so it loads. If it can't find the stage 2 it usually quietly dies. I believe it has to load stage 1 to have enough code to actually give error messages. The mbr is just too small to get all the code into. So changing drives also changed the bios disk order on your system, and grub got confused.
I sympathize (I'm confused, too).
I can't swear to it, but I'm pretty sure I had rebooted several times after changing the boot drive and the boot drive order, all without a hitch. Then this happened.
I can swear (now) that I have rebooted several times since recovering, so if I muddled through this correctly, you're saying that it shouldn't happen again as long as I don't change the drive order again, right?
One of the things that I found rather irritating in all this was the utter lack of clarity provided in both the man pages for grub and grub-install, and the info pages (which are supposed to be more in detail but are not, really). How do I know which disk is which from grub's p.o.v.? There is no command to list the drives, and I wound up using the geometry command and my personal knowledge of what those were supposed to be to figure out which one grub thought was which, and even that made no sense because what grub saw as hd0 was my /dev/hda drive (which is not the boot drive) and hd2 was my /dev/sda, which _is_ the boot drive. Or do the drive designations change once the system is up? (I.e., in my grub.conf, the boot drive is hd0, but when the system comes up, it's hd2.)
I've looked through the documentation for grub at http://www.gnu.org/software/grub/manual/html_node/index.html and this particular ideosyncrasy is not clear.
Thanks, all.
PS: My apologies for the earlier html post - I sent that via Evolution from home, and apparently it is not configured for text-only by default (which I completely forgot).
mhr
on 9-22-2008 11:06 AM MHR spake the following:
On Mon, Sep 22, 2008 at 10:08 AM, Scott Silva ssilva-m4n3GYAQT2lWk0Htik3J/w@public.gmane.org wrote:
Grub boot code is in the mbr so it loads. If it can't find the stage 2 it usually quietly dies. I believe it has to load stage 1 to have enough code to actually give error messages. The mbr is just too small to get all the code into. So changing drives also changed the bios disk order on your system, and grub got confused.
I sympathize (I'm confused, too).
I can't swear to it, but I'm pretty sure I had rebooted several times after changing the boot drive and the boot drive order, all without a hitch. Then this happened.
I can swear (now) that I have rebooted several times since recovering, so if I muddled through this correctly, you're saying that it shouldn't happen again as long as I don't change the drive order again, right?
One of the things that I found rather irritating in all this was the utter lack of clarity provided in both the man pages for grub and grub-install, and the info pages (which are supposed to be more in detail but are not, really). How do I know which disk is which from grub's p.o.v.? There is no command to list the drives, and I wound up using the geometry command and my personal knowledge of what those were supposed to be to figure out which one grub thought was which, and even that made no sense because what grub saw as hd0 was my /dev/hda drive (which is not the boot drive) and hd2 was my /dev/sda, which _is_ the boot drive. Or do the drive designations change once the system is up? (I.e., in my grub.conf, the boot drive is hd0, but when the system comes up, it's hd2.)
I've looked through the documentation for grub at http://www.gnu.org/software/grub/manual/html_node/index.html and this particular ideosyncrasy is not clear.
Thanks, all.
PS: My apologies for the earlier html post - I sent that via Evolution from home, and apparently it is not configured for text-only by default (which I completely forgot).
mhr
It is more common with the systems that have pata and sata interfaces on them. The bios first starts its int13 code and maps drives in a certain order, then when linux starts, its drivers load and everything re-maps again. If you have hard drives on both, it is a crapshoot sometimes. The newer kernels have supposedly moved the old ide code into the base sata drivers, so someday all the drives will show up as sd?.
http://linuxgazette.net/141/anonymous.html
http://linux.knightnet.org.uk/2008/01/more-on-grub-bug-with-mixed-pata-and.h...
On Mon, 2008-09-22 at 11:06 -0700, MHR wrote:
On Mon, Sep 22, 2008 at 10:08 AM, Scott Silva ssilva@sgvwater.com wrote:
Grub boot code is in the mbr so it loads. If it can't find the stage 2 it usually quietly dies. I believe it has to load stage 1 to have enough code to actually give error messages. The mbr is just too small to get all the code into. So changing drives also changed the bios disk order on your system, and grub got confused.
I sympathize (I'm confused, too).
I can't swear to it, but I'm pretty sure I had rebooted several times after changing the boot drive and the boot drive order, all without a hitch. Then this happened.
I can swear (now) that I have rebooted several times since recovering, so if I muddled through this correctly, you're saying that it shouldn't happen again as long as I don't change the drive order again, right?
One of the things that I found rather irritating in all this was the utter lack of clarity provided in both the man pages for grub and grub-install, and the info pages (which are supposed to be more in detail but are not, really). How do I know which disk is which from grub's p.o.v.? There is no command to list the drives, and I wound up
BIOS assigns hex "drive ID" to the boot disk 0f 0x80. From BIOS POV, next is 0x81, 0x82, ...
Now, say you set in BIOS to boot from hdd. It becomes 0x80, hda becomes 0x81, ...
This is the code passed to GRUB, which interprets hd{0,1,2,3} relative to 0x80, 0x81, ...
Note that with hdd selected as boot, we have a "shuffle up":
hda->hdb, hdb->hdc, hdc->hdd and, of course, hdd->hda.
OTOH (IIRC - be careful now I'm recalling deep dark stuff) if you select hdb as the boot device, it becomes 0x80, ... NEVER MIND!
Read this instead (from 2003)
http://www.linuxfromscratch.org/hints/downloads/files/boot_any_hd.txt
<snip>
HTH