Hi,
I hope someone can help.
Just started to play with software RAID on Centos 3.5 and was trying to simulate a faulty drive by using the -f switch on mdadm to mark the partition (drive) as faulty and I then I removed and readded the drive, which quit happily rebuilt according the /proc/mdstat and the output from the --detail switch of mdadm. After all this mucking around I shutdown the system and tried to restart it in the morning but the system now won't boot, it gets as far as "GRUB loading" and just stops I have hard reset to get it to reboot.
I have rebooted using the install CD in rescue mode and I can see that all the arrays are setup quite happily (all RAID 1's if it matters) and mdstat reports the status as "dirty,no-errors". Is that status normal? all the how-tos I've seen about software RAID show this as the status, so I have kind of assumed that is okay.
Given that all the data is readable from the arrays, I'm guessing that I had to do something to GRUB prior after rebuilding the arrays and before shutting down the system.
Can anyone help fill in the gap in my knowledge as to what I should have done?
Thanks for any help.
Regards
Lee
On Sat, 25 Jun 2005, Lee W wrote:
I hope someone can help.
Just started to play with software RAID on Centos 3.5 and was trying to simulate a faulty drive by using the -f switch on mdadm to mark the partition (drive) as faulty and I then I removed and readded the drive, which quit happily rebuilt according the /proc/mdstat and the output from the --detail switch of mdadm. After all this mucking around I shutdown the system and tried to restart it in the morning but the system now won't boot, it gets as far as "GRUB loading" and just stops I have hard reset to get it to reboot.
I have rebooted using the install CD in rescue mode and I can see that all the arrays are setup quite happily (all RAID 1's if it matters) and mdstat reports the status as "dirty,no-errors". Is that status normal? all the how-tos I've seen about software RAID show this as the status, so I have kind of assumed that is okay.
Given that all the data is readable from the arrays, I'm guessing that I had to do something to GRUB prior after rebuilding the arrays and before shutting down the system.
Can anyone help fill in the gap in my knowledge as to what I should have done?
You may want to look at this:
http://forums.fedoraforum.org/showthread.php?t=26912
Not sure if it is the same cause, but worth to check.
Kind regards, -- dag wieers, dag@wieers.com, http://dag.wieers.com/ -- [all I want is a warm bed and a kind word and unlimited power]
On Sat, 2005-06-25 at 09:48, Lee W wrote:
Just started to play with software RAID on Centos 3.5 and was trying to simulate a faulty drive by using the -f switch on mdadm to mark the partition (drive) as faulty and I then I removed and readded the drive, which quit happily rebuilt according the /proc/mdstat and the output from the --detail switch of mdadm. After all this mucking around I shutdown the system and tried to restart it in the morning but the system now won't boot, it gets as far as "GRUB loading" and just stops I have hard reset to get it to reboot.
I have rebooted using the install CD in rescue mode and I can see that all the arrays are setup quite happily (all RAID 1's if it matters) and mdstat reports the status as "dirty,no-errors". Is that status normal? all the how-tos I've seen about software RAID show this as the status, so I have kind of assumed that is okay.
Given that all the data is readable from the arrays, I'm guessing that I had to do something to GRUB prior after rebuilding the arrays and before shutting down the system.
Grub doesn't actually boot from a RAID - it just happens to work because RAID1 looks the same on the underlying single partitions. Boot with the CD and point the grub config to one or the other of the hd partitions holding /boot. If you get an error, that one might be corrupt and you can try the other. If you get to the point where the kernel is loaded and you can't mount root, then the problem could be with RAID, but that part will probably work.
Grub doesn't actually boot from a RAID - it just happens to work because RAID1 looks the same on the underlying single partitions. Boot with the CD and point the grub config to one or the other of the hd partitions holding /boot. If you get an error, that one might be corrupt and you can try the other. If you get to the point where the kernel is loaded and you can't mount root, then the problem could be with RAID, but that part will probably work.
I always use lilo for mirrored servers. It will boot from either drive when one fails.
On Sat, 2005-06-25 at 14:00, Timothy Sandel wrote:
Grub doesn't actually boot from a RAID - it just happens to work because RAID1 looks the same on the underlying single partitions. Boot with the CD and point the grub config to one or the other of the hd partitions holding /boot. If you get an error, that one might be corrupt and you can try the other. If you get to the point where the kernel is loaded and you can't mount root, then the problem could be with RAID, but that part will probably work.
I always use lilo for mirrored servers. It will boot from either drive when one fails.
Grub will do that too if you manually install it on the other drive. The details of how you do that will vary depending on how your bios sees the other drive if the primary one fails. Grub doesn't have to be re-installed at every config file update though, so it isn't that bad to do it once.
Grub doesn't actually boot from a RAID - it just happens to work because RAID1 looks the same on the underlying single partitions. Boot with the CD and point the grub config to one or the other of the hd partitions holding /boot. If you get an error, that one might be corrupt and you can try the other. If you get to the point where the kernel is loaded and you can't mount root, then the problem could be with RAID, but that part will probably work.
I always use lilo for mirrored servers. It will boot from either drive when one fails.
Grub will do that too if you manually install it on the other drive. The details of how you do that will vary depending on how your bios sees the other drive if the primary one fails. Grub doesn't have to be re-installed at every config file update though, so it isn't that bad to do it once.
Here's a good paper on this.......
"Configuring and Managing Software RAID with Red Hat Enterprise Linux 3"
Les Mikesell wrote:
Grub doesn't actually boot from a RAID - it just happens to work because RAID1 looks the same on the underlying single partitions. Boot with the CD and point the grub config to one or the other of the hd partitions holding /boot. If you get an error, that one might be corrupt and you can try the other. If you get to the point where the kernel is loaded and you can't mount root, then the problem could be with RAID, but that part will probably work.
Thanks Les, that has sorted it.
Is it just a limitation of the installer that it doesn't install grub on all the devices in the array?
I was guessing that it wouldn't be that difficult for the installer to do this, perhaps when you select /boot as the mount point for the raid device.
Thanks again
Regards
Lee
On Sun, 2005-06-26 at 01:04 +0100, Lee W wrote:
Thanks Les, that has sorted it. Is it just a limitation of the installer that it doesn't install grub on all the devices in the array? I was guessing that it wouldn't be that difficult for the installer to do this, perhaps when you select /boot as the mount point for the raid device.
The problem with this is the fact that BIOS device addresses _change_ when you remove a drive. So do you setup GRUB on the 2nd+ device to be BIOS fixed disk 81h (the first device has failed, but has not been removed) or BIOS fixed disk 80h (the first device has failed and has been removed)?
Even though the Linux device name might not change, GRUB needs to know what BIOS fixed disk assignment the disk is, and that _could_ change on the 2nd+ redundant boot devices.
Quoting "Bryan J. Smith" b.j.smith@ieee.org:
The problem with this is the fact that BIOS device addresses _change_ when you remove a drive. So do you setup GRUB on the 2nd+ device to be BIOS fixed disk 81h (the first device has failed, but has not been removed) or BIOS fixed disk 80h (the first device has failed and has been removed)?
Even though the Linux device name might not change, GRUB needs to know what BIOS fixed disk assignment the disk is, and that _could_ change on the 2nd+ redundant boot devices.
Hmmm... I think I read somewhere that if "root" directive is omitted, grub will default to first partition of whatever disk it was booted from. Or something like that. Basically, remove all root lines, and feed this to grub (each install command should be one line, in case your mail reader breaks them):
install --stage2=/boot/grub/stage2 (hd0,0)/boot/grub/stage1 (hd0) (hd0,0)/boot/grub/stage2 p (hd0,0)/boot/grub/grub.conf
install --stage2=/boot/grub/stage2 (hd1,0)/boot/grub/stage1 (hd1) (hd1,0)/boot/grub/stage2 p (hd1,0)/boot/grub/grub.conf
quit
Also, if you keep your boot drives to be the first two drives in the system (or if they are the only two drives), then configuring Grub to load its stuff from first drive is usually a safe bet (as long as failed drive failed completely, not visible by the BIOS).
---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
On Sat, 2005-06-25 at 19:04, Lee W wrote:
Grub doesn't actually boot from a RAID
Thanks Les, that has sorted it.
Is it just a limitation of the installer that it doesn't install grub on all the devices in the array?
Yes.
I was guessing that it wouldn't be that difficult for the installer to do this, perhaps when you select /boot as the mount point for the raid device.
Lilo figures it out and installs on both drives, but I'm not sure it is correct for all the variations of how bios sees the drives if the primary fails.
On Sat, 2005-06-25 at 22:45 -0500, Les Mikesell wrote:
Lilo figures it out and installs on both drives, but I'm not sure it is correct for all the variations of how bios sees the drives if the primary fails.
Yep, that's the problem with the PC BIOS. One any "nex-gen" firmware should solve.
Hopefully Apple will show the PC world how it should be done. Even if Apple is proprietary, a commodity implementation will follow.
I don't put much hope in Intel's supposed "open" efforts right now. Given that most everyone has said the "open" aspec is a farce.
God I miss Intel of the '80s, back when they used to pay AMD to fab. They did some great stuff back then, like pose the concept of a commodity 32-bit processor with an optional MMU.
Quoting "Bryan J. Smith" b.j.smith@ieee.org:
Yep, that's the problem with the PC BIOS. One any "nex-gen" firmware should solve.
Hopefully Apple will show the PC world how it should be done. Even if Apple is proprietary, a commodity implementation will follow.
Sun did it long time ago in its OpenBoot PROM. Was it based on some IEEE standard or something similar? Don't remember anymore...
The only thing, I guess OpenBoot CLI would be too much even for "advanced" Windows admins ;-)
---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
On Mon, 2005-06-27 at 15:08 -0500, alex@milivojevic.org wrote:
Sun did it long time ago in its OpenBoot PROM. Was it based on some IEEE standard or something similar? Don't remember anymore... The only thing, I guess OpenBoot CLI would be too much even for "advanced" Windows admins ;-)
Well, ultimately, there are a number of people to blame.
What I'm hoping for is that people use the new Mac PC's and go "WTF can't other PCs do that?" That will force most PCs to adopt firmware that is just as capable as Mac.
What us "UNIX wennies" have been used to with Sun OpenBoot, Digital SRM, etc... for years.
On Tuesday 28 June 2005 00:41, Bryan J. Smith wrote:
What us "UNIX wennies" have been used to with Sun OpenBoot, Digital SRM, etc... for years.
Even with Linux, the OpenBoot and its Forth extensions makes more sense. And SRM is one interesting piece of work, with processes and filesystem support (after a fashion) at boot. There is a LinuxBIOS project already out there, though.
On Tue, Jun 28, 2005 at 11:06:09AM -0400, Lamar Owen wrote:
(after a fashion) at boot. There is a LinuxBIOS project already out there, though.
I just looked at that a few weeks ago - it doesn't seem to have reached critical mass. Would sure be nice.
heck, i'd settle for getting serial-bios capability out of commodity motherboards. "server" systems (and maybe high-end "workstation" motherboards) have it but things like the via c3 boxes i have posted about before generally don't.
danno -- dan pritts - systems administrator - internet2 734/352-4953 office 734/834-7224 mobile
Quoting Lee W centos-list@unassemble.co.uk:
Thanks Les, that has sorted it.
Is it just a limitation of the installer that it doesn't install grub on all the devices in the array?
I was guessing that it wouldn't be that difficult for the installer to do this, perhaps when you select /boot as the mount point for the raid device.
I believe that was filled as bug in installer long time ago. I've read somewhere somplace that it will be fixed in FC4 (maybe even FC3?), but never bothered to check. LILO worked fine for me on servers, so there was really never a good incentive to switch to Grub.
To be fair, Grub offers some nice features. For example, Grub doesn't require reinstalling on config file update. The question is how often do you change config file? If we exclude kernel installs, the answer is probably close to never. And if you use kernels from RPM packages, the postinstall script is going to take care of that anyhow.
Another nice feature is that if Grub fails to load its config file (or boot the OS), you still get CLI where you can attempt to boot by hand (LILO just gets stuck). After I reinstalled my laptop, this CLI is the only way I can boot it (I already spent too much time attemting to fix it, the only thing I haven't did was to reinstall LILO). On the other hand, the previous installation used LILO which booted the laptop every time. Somehow I'd rather have boot loader simply do its job, then offer me CLI to do it manually ;-)
---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.