I've been going along not noticing things happening right under my nose. so imagine my surprise when I discovered last night that my Centos 5 box has installed multiple new kernels over the last few months, as updates come out, and IT IS NOT BOOTING THE NEWST ONE.
grub.conf says to boot kernel 0, and 0 is the newest one. but the one it actually boots is 6 or 8 down the list (clearly I've not been keeping things cleaned up, either).
Below is some info that shows the problem. Can anyone here provide helpful suggestions on (1) why it is doing this, and more importantly (2) how I can make it stop?
Thanks!
uname reports: 2.6.18-164.15.1.el5PAE #1 SMP Wed Mar 17 12:14:29 EDT 2010 i686 athlon i386 GNU/Linux
while /etc/grub.conf contains:
# grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,0) # kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00 # initrd /initrd-version.img #boot=/dev/md0 default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.18-194.11.1.el5PAE) root (hd0,0) kernel /vmlinuz-2.6.18-194.11.1.el5PAE ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-194.11.1.el5PAE.img title CentOS (2.6.18-194.11.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-194.11.1.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-194.11.1.el5.img title CentOS (2.6.18-194.8.1.el5PAE) root (hd0,0) kernel /vmlinuz-2.6.18-194.8.1.el5PAE ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-194.8.1.el5PAE.img title CentOS (2.6.18-194.8.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-194.8.1.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-194.8.1.el5.img title CentOS (2.6.18-194.3.1.el5PAE) root (hd0,0) kernel /vmlinuz-2.6.18-194.3.1.el5PAE ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-194.3.1.el5PAE.img title CentOS (2.6.18-194.3.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-194.3.1.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-194.3.1.el5.img title CentOS (2.6.18-164.15.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-164.15.1.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-164.15.1.el5.img title CentOS (2.6.18-164.15.1.el5PAE) root (hd0,0) kernel /vmlinuz-2.6.18-164.15.1.el5PAE ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-164.15.1.el5PAE.img title CentOS (2.6.18-164.11.1.el5PAE) root (hd0,0) kernel /vmlinuz-2.6.18-164.11.1.el5PAE ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-164.11.1.el5PAE.img title CentOS (2.6.18-164.11.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-164.11.1.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-164.11.1.el5.img
On Mon, Aug 30, 2010 at 8:18 PM, fred smith fredex@fcshome.stoneham.ma.us wrote:
Below is some info that shows the problem. Can anyone here provide helpful suggestions on (1) why it is doing this, and more importantly (2) how I can make it stop?
Is there a chance /boot is full (read: are all the menu'd kernels actually present in /boot)?
(IIRC something similar happened because the /boot partition was set at the recommended size of 100 MB).
Kind regards/ldv
On Mon, Aug 30, 2010 at 08:41:31PM -0500, Larry Vaden wrote:
On Mon, Aug 30, 2010 at 8:18 PM, fred smith fredex@fcshome.stoneham.ma.us wrote:
Below is some info that shows the problem. Can anyone here provide helpful suggestions on (1) why it is doing this, and more importantly (2) how I can make it stop?
Is there a chance /boot is full (read: are all the menu'd kernels actually present in /boot)?
(IIRC something similar happened because the /boot partition was set at the recommended size of 100 MB).
/boot doesn't appear to be full, there appear to be 25.2 megabytes free with 20.1 available.
another curious thing I just noticed is this: the list of kernels available at boot time (in the actual grub menu shown at boot) IS NOT THE SAME LIST THAT APPEARS IN GRUB.CONF. in the boot-time menu, the kernel it boots is the most recent one shown, and there are other older ones that do not appear in grub.conf. while in grub.conf there are several newer ones that do not appear on the boot-time grub menu.
most strange.
BTW, this is a raid-1 array using linux software raid, with two matching drives. Is there possibly some way the two drives could have gotten out of sync such that whichever one is the actual boot device has invalid info in /boot?
and while thinking along those lines, I see a number of mails in root's mailbox from "md" notifying us of a degraded array. these all appear to have happened, AFAICT, at system boot, over the last several months.
also, /var/log/messages contains a bunch of stuff like the below, also apparently at system boot, and I don't really know what it means, though the lines mentining a device being "kicked out" seem ominous:
Aug 30 22:09:08 fcshome kernel: device-mapper: uevent: version 1.0.3 Aug 30 22:09:08 fcshome kernel: device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel@redhat.com Aug 30 22:09:08 fcshome kernel: device-mapper: dm-raid45: initialized v0.2594l Aug 30 22:09:08 fcshome kernel: md: Autodetecting RAID arrays. Aug 30 22:09:08 fcshome kernel: md: autorun ... Aug 30 22:09:08 fcshome kernel: md: considering sdb2 ... Aug 30 22:09:08 fcshome kernel: md: adding sdb2 ... Aug 30 22:09:08 fcshome kernel: md: sdb1 has different UUID to sdb2 Aug 30 22:09:08 fcshome kernel: md: adding sda2 ... Aug 30 22:09:08 fcshome kernel: md: sda1 has different UUID to sdb2 Aug 30 22:09:08 fcshome kernel: md: created md1 Aug 30 22:09:08 fcshome kernel: md: bind<sda2> Aug 30 22:09:08 fcshome kernel: md: bind<sdb2> Aug 30 22:09:08 fcshome kernel: md: running: <sdb2><sda2> Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda2 from array! Aug 30 22:09:08 fcshome kernel: md: unbind<sda2> Aug 30 22:09:08 fcshome kernel: md: export_rdev(sda2) Aug 30 22:09:08 fcshome kernel: raid1: raid set md1 active with 1 out of 2 mirro rs Aug 30 22:09:08 fcshome kernel: md: considering sdb1 ... Aug 30 22:09:08 fcshome kernel: md: adding sdb1 ... Aug 30 22:09:08 fcshome kernel: md: adding sda1 ... Aug 30 22:09:08 fcshome kernel: md: created md0 Aug 30 22:09:08 fcshome kernel: md: bind<sda1> Aug 30 22:09:08 fcshome kernel: md: bind<sdb1> Aug 30 22:09:08 fcshome kernel: md: running: <sdb1><sda1> Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda1 from array! Aug 30 22:09:08 fcshome kernel: md: unbind<sda1> Aug 30 22:09:08 fcshome kernel: md: export_rdev(sda1) Aug 30 22:09:08 fcshome kernel: raid1: raid set md0 active with 1 out of 2 mirro rs Aug 30 22:09:08 fcshome kernel: md: ... autorun DONE.
Kind regards/ldv _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
another curious thing I just noticed is this: the list of kernels available at boot time (in the actual grub menu shown at boot) IS NOT THE SAME LIST THAT APPEARS IN GRUB.CONF. in the boot-time menu, the kernel it boots is the most recent one shown, and there are other older ones that do not appear in grub.conf. while in grub.conf there are several newer ones that do not appear on the boot-time grub menu.
Has Grub been installed from another partition, in which it's looking for the grub.conf file?
On Mon, Aug 30, 2010 at 11:05:50PM -0400, Yves Bellefeuille wrote:
another curious thing I just noticed is this: the list of kernels available at boot time (in the actual grub menu shown at boot) IS NOT THE SAME LIST THAT APPEARS IN GRUB.CONF. in the boot-time menu, the kernel it boots is the most recent one shown, and there are other older ones that do not appear in grub.conf. while in grub.conf there are several newer ones that do not appear on the boot-time grub menu.
Has Grub been installed from another partition, in which it's looking for the grub.conf file?
don't think so, doesn't look like it:
ls -l `locate grub.conf` -rw------- 1 root root 2404 Aug 30 21:39 /boot/grub/grub.conf lrwxrwxrwx 1 root root 22 Jul 24 2009 /etc/grub.conf -> ../boot/grub/grub.conf
On 08/30/2010 09:24 PM, fred smith informed us:
<snip>another curious thing I just noticed is this: the list of kernels available
at boot time (in the actual grub menu shown at boot) IS NOT THE SAME LIST THAT APPEARS IN GRUB.CONF. in the boot-time menu, the kernel it boots is the most recent one shown, and there are other older ones that do not appear in grub.conf. while in grub.conf there are several newer ones that do not appear on the boot-time grub menu.
most strange.
BTW, this is a raid-1 array using linux software raid, with two matching drives. Is there possibly some way the two drives could have gotten out of sync such that whichever one is the actual boot device has invalid info in /boot?
and while thinking along those lines, I see a number of mails in root's mailbox from "md" notifying us of a degraded array. these all appear to have happened, AFAICT, at system boot, over the last several months.
also, /var/log/messages contains a bunch of stuff like the below, also apparently at system boot, and I don't really know what it means, though
<snip>
This is not the magic solution that you quite understandably would prefer. I hope someone can pinpoint your trouble. UNTIL THEN, I think you would be 'way ahead to make a full backup (or 2) to an external drive, disconnect that baby and start troubleshooting, confident that you won't lose all your data.
I'll bet that #cat /proc/mdstat looks really scary. Mine looks like this: [root@madeleine grub]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[1] sda1[0] 409536 blocks [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0] 3903680 blocks [2/2] [UU]
md3 : active raid1 sdb4[1] sda4[0] 108502912 blocks [2/2] [UU]
md1 : active raid1 sdb2[1] sda2[0] 375567488 blocks [2/2] [UU]
unused devices: <none> [root@madeleine grub]#
Other than that, the system boots from /boot/grub/grub.conf and that should be what you see during the boot process. The other two, /etc/grub.conf and /boot/grub/menu.lst are symlinks to the real deal It might be interesting to have a look at /etc/fstab then issue a mount command with no arguments to see if anything is mounted on /boot
You might find valuable RAID 1 information at: http://www.howtoforge.com/how-to-set-up-software-raid1-on-a-running-system-i...
HTH
On Mon, Aug 30, 2010 at 10:46:26PM -0500, Robert wrote:
On 08/30/2010 09:24 PM, fred smith informed us:
<snip>another curious thing I just noticed is this: the list of kernels available
at boot time (in the actual grub menu shown at boot) IS NOT THE SAME LIST THAT APPEARS IN GRUB.CONF. in the boot-time menu, the kernel it boots is the most recent one shown, and there are other older ones that do not appear in grub.conf. while in grub.conf there are several newer ones that do not appear on the boot-time grub menu.
most strange.
BTW, this is a raid-1 array using linux software raid, with two matching drives. Is there possibly some way the two drives could have gotten out of sync such that whichever one is the actual boot device has invalid info in /boot?
and while thinking along those lines, I see a number of mails in root's mailbox from "md" notifying us of a degraded array. these all appear to have happened, AFAICT, at system boot, over the last several months.
also, /var/log/messages contains a bunch of stuff like the below, also apparently at system boot, and I don't really know what it means, though
<snip>
This is not the magic solution that you quite understandably would prefer. I hope someone can pinpoint your trouble. UNTIL THEN, I think you would be 'way ahead to make a full backup (or 2) to an external drive, disconnect that baby and start troubleshooting, confident that you won't lose all your data.
I'll bet that #cat /proc/mdstat looks really scary. Mine looks like this: [root@madeleine grub]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[1] sda1[0] 409536 blocks [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0] 3903680 blocks [2/2] [UU]
md3 : active raid1 sdb4[1] sda4[0] 108502912 blocks [2/2] [UU]
md1 : active raid1 sdb2[1] sda2[0] 375567488 blocks [2/2] [UU]
unused devices: <none> [root@madeleine grub]#
here's mine (indented for readability):
cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[1] 104320 blocks [2/1] [_U] md1 : active raid1 sdb2[1] 312464128 blocks [2/1] [_U] unused devices: <none>
Other than that, the system boots from /boot/grub/grub.conf and that should be what you see during the boot process. The other two, /etc/grub.conf and /boot/grub/menu.lst are symlinks to the real deal
yes, they're all symlinked correctly.
It might be interesting to have a look at /etc/fstab then issue a mount command with no arguments to see if anything is mounted on /boot
hmmmm.... I find th is in /etc/fstab:
/dev/md0 /boot ext3 defaults 1 2
and this in the output of a bare mount command:
/dev/md0 on /boot type ext3 (rw)
so those look OK.
You might find valuable RAID 1 information at: http://www.howtoforge.com/how-to-set-up-software-raid1-on-a-running-system-i...
I'll take a look at that link. thanks.
I'll also dig for the HOWTO I used when setting it up. As I look at this I recall that I had to tweak the scripts that create the initrd. so, if one of the updates since has reinstalled that, I may no longer be getting the desired initird built. sounds ominous...
Thanks for the info!
On 08/30/2010 07:24 PM, fred smith wrote:
Aug 30 22:09:08 fcshome kernel: md: created md1
...
Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda2 from array!
...
Aug 30 22:09:08 fcshome kernel: md: created md0
...
Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda1 from array!
Yep, your arrays are broken. mdmonitor should have emailed you about this. Make sure that you receive and read mail to the root user.
/sbin/mdadm /dev/md1 --fail /dev/sda2 --remove /dev/sda2 /sbin/mdadm /dev/md1 --add /dev/sda2
/sbin/mdadm /dev/md0 --fail /dev/sda1 --remove /dev/sda1 /sbin/mdadm /dev/md0 --add /dev/sda1
On Mon, Aug 30, 2010 at 09:13:07PM -0700, Gordon Messmer wrote:
On 08/30/2010 07:24 PM, fred smith wrote:
Aug 30 22:09:08 fcshome kernel: md: created md1
...
Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda2 from array!
...
Aug 30 22:09:08 fcshome kernel: md: created md0
...
Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda1 from array!
Yep, your arrays are broken. mdmonitor should have emailed you about this. Make sure that you receive and read mail to the root user.
/sbin/mdadm /dev/md1 --fail /dev/sda2 --remove /dev/sda2 /sbin/mdadm /dev/md1 --add /dev/sda2
/sbin/mdadm /dev/md0 --fail /dev/sda1 --remove /dev/sda1 /sbin/mdadm /dev/md0 --add /dev/sda1
there are only two drives in this mirrored array, sda and sdb. if I need to re-add them both (which, if I understand mdadm correctly, is what your suggestion above would do) how does it know which one is the correct one to re-sync the array with?
thanks for the info!
At Tue, 31 Aug 2010 09:44:53 -0400 CentOS mailing list centos@centos.org wrote:
On Mon, Aug 30, 2010 at 09:13:07PM -0700, Gordon Messmer wrote:
On 08/30/2010 07:24 PM, fred smith wrote:
Aug 30 22:09:08 fcshome kernel: md: created md1
...
Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda2 from array!
...
Aug 30 22:09:08 fcshome kernel: md: created md0
...
Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda1 from array!
Yep, your arrays are broken. mdmonitor should have emailed you about this. Make sure that you receive and read mail to the root user.
/sbin/mdadm /dev/md1 --fail /dev/sda2 --remove /dev/sda2 /sbin/mdadm /dev/md1 --add /dev/sda2
/sbin/mdadm /dev/md0 --fail /dev/sda1 --remove /dev/sda1 /sbin/mdadm /dev/md0 --add /dev/sda1
there are only two drives in this mirrored array, sda and sdb. if I need to re-add them both (which, if I understand mdadm correctly, is what your suggestion above would do) how does it know which one is the correct one to re-sync the array with?
The one to resync with is the one that is now an active part of the array(s). In This case /dev/sdb1 for md0 and /dev/sdb2 for md1. The '-add' option tells mdadm that the specificed disk is to be 'added' to the existing RAID set and since this is a mirrored array, the contents of the RAID set is copied to the *added* disk.
The '--fail' and '--remove' options tell mdadm to
A) consider that the specified member has failed take it offline B) remove the specified member from being considered part of the raid set.
This effectively makes the raid sets consist of single disks (normally a silly thing to do, but perfectly possible).
When you then --add the disks back, mdadm treats these disks as *new* disks with no valid data on them. The new disks become spare sets and the raid subsystem procedes to rebuild/resync the raid array.
thanks for the info!
On 08/31/2010 06:44 AM, fred smith wrote:
there are only two drives in this mirrored array, sda and sdb. if I need to re-add them both (which, if I understand mdadm correctly, is what your suggestion above would do) how does it know which one is the correct one to re-sync the array with?
None of the commands I suggested mentioned sdb, so I'm not sure why you'd get that impression.
Your kernel logged that it was removing sda2 and sda1 from the array because they are not in sync (non-fresh). /proc/mdstat indicates that sdb1 and sdb2 are components of arrays, but sda1 and sda2 are not.
Most importantly, the "mdmonitor" service should have emailed the root user to notify you that your arrays are broken. You should find out why you aren't getting that email.
On Tue, Aug 31, 2010 at 08:55:47AM -0700, Gordon Messmer wrote:
On 08/31/2010 06:44 AM, fred smith wrote:
there are only two drives in this mirrored array, sda and sdb. if I need to re-add them both (which, if I understand mdadm correctly, is what your suggestion above would do) how does it know which one is the correct one to re-sync the array with?
None of the commands I suggested mentioned sdb, so I'm not sure why you'd get that impression.
Maybe due to lack of caffeine...
Your kernel logged that it was removing sda2 and sda1 from the array because they are not in sync (non-fresh). /proc/mdstat indicates that sdb1 and sdb2 are components of arrays, but sda1 and sda2 are not.
Most importantly, the "mdmonitor" service should have emailed the root user to notify you that your arrays are broken. You should find out why you aren't getting that email.
actually, root got it, but I'm rather lax about checking root's email. I've changed /etc/mdadm.conf to send it to user fredex in the future.
thanks for your assistance!
On 8/31/2010 11:24 AM, fred smith wrote:
Most importantly, the "mdmonitor" service should have emailed the root user to notify you that your arrays are broken. You should find out why you aren't getting that email.
actually, root got it, but I'm rather lax about checking root's email. I've changed /etc/mdadm.conf to send it to user fredex in the future.
Might be better to add a mail alias to make all of root's email go somewhere useful. Mdmonitor may not be the only thing with important warnings.
On Tue, Aug 31, 2010 at 11:37:45AM -0500, Les Mikesell wrote:
On 8/31/2010 11:24 AM, fred smith wrote:
Most importantly, the "mdmonitor" service should have emailed the root user to notify you that your arrays are broken. You should find out why you aren't getting that email.
actually, root got it, but I'm rather lax about checking root's email. I've changed /etc/mdadm.conf to send it to user fredex in the future.
Might be better to add a mail alias to make all of root's email go somewhere useful. Mdmonitor may not be the only thing with important warnings.
along with some suitable Procmail recipe to segregate it into its own mailbox (like I already do with list email) so it won't clutter up my main/home/default mailbox.
thanks for the hint, I'll look into that.
On Mon, Aug 30, 2010 at 09:13:07PM -0700, Gordon Messmer wrote:
On 08/30/2010 07:24 PM, fred smith wrote:
Aug 30 22:09:08 fcshome kernel: md: created md1
...
Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda2 from array!
...
Aug 30 22:09:08 fcshome kernel: md: created md0
...
Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda1 from array!
Yep, your arrays are broken. mdmonitor should have emailed you about this. Make sure that you receive and read mail to the root user.
/sbin/mdadm /dev/md1 --fail /dev/sda2 --remove /dev/sda2 /sbin/mdadm /dev/md1 --add /dev/sda2
/sbin/mdadm /dev/md0 --fail /dev/sda1 --remove /dev/sda1 /sbin/mdadm /dev/md0 --add /dev/sda1
Thanks to Gordon, Robert, and all the others who contributed to my learning experience!
The problem was that /dev/sda had dropped out of the raid array, and that sdb remained. while sdb remained, it is the one that was being updated by yum updates, but sda is the one that grub was booting. hence the out- of-sync files/kernels/etc.
following the instructions above has solved the problem, and the array is now rebuilding.
I found some references online in several places (newegg commentson the specific WD drive I have, as well as other places) to a drive "feature" called LTER that allows setting a timeout for slow reads (?? maybe read errors??? I'd have to go back and re-read 'cause my memory is gone), and the default setting lets it delay for very long periods, causing the raid controller to think the drive has died and to drop it from the array. Apparently Linux software raid is subject to the same issue.
These online sources go on to mention that older versions of the specific drive model can have this setting changed (with the WDTLER.exe utility), but that WD, in its infinite wisdom, has removed that capability from "newer" drives. I tried the utility on my system and it reports "no drives found" so mine, even though they're over a year old, must be of the "newer" category. I may end up replacing them if they continue to do this to me. After all, the purpose and intent in building a desktop system with RAID-1 was redundancy, NOT HASSLE.
Thanks again to all who advised.
Fred
At Mon, 30 Aug 2010 22:24:19 -0400 CentOS mailing list centos@centos.org wrote:
On Mon, Aug 30, 2010 at 08:41:31PM -0500, Larry Vaden wrote:
On Mon, Aug 30, 2010 at 8:18 PM, fred smith fredex@fcshome.stoneham.ma.us wrote:
Below is some info that shows the problem. Can anyone here provide helpful suggestions on (1) why it is doing this, and more importantly (2) how I can make it stop?
Is there a chance /boot is full (read: are all the menu'd kernels actually present in /boot)?
(IIRC something similar happened because the /boot partition was set at the recommended size of 100 MB).
/boot doesn't appear to be full, there appear to be 25.2 megabytes free with 20.1 available.
another curious thing I just noticed is this: the list of kernels available at boot time (in the actual grub menu shown at boot) IS NOT THE SAME LIST THAT APPEARS IN GRUB.CONF. in the boot-time menu, the kernel it boots is the most recent one shown, and there are other older ones that do not appear in grub.conf. while in grub.conf there are several newer ones that do not appear on the boot-time grub menu.
most strange.
BTW, this is a raid-1 array using linux software raid, with two matching drives. Is there possibly some way the two drives could have gotten out of sync such that whichever one is the actual boot device has invalid info in /boot?
and while thinking along those lines, I see a number of mails in root's mailbox from "md" notifying us of a degraded array. these all appear to have happened, AFAICT, at system boot, over the last several months.
also, /var/log/messages contains a bunch of stuff like the below, also apparently at system boot, and I don't really know what it means, though the lines mentining a device being "kicked out" seem ominous:
Aug 30 22:09:08 fcshome kernel: device-mapper: uevent: version 1.0.3 Aug 30 22:09:08 fcshome kernel: device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel@redhat.com Aug 30 22:09:08 fcshome kernel: device-mapper: dm-raid45: initialized v0.2594l Aug 30 22:09:08 fcshome kernel: md: Autodetecting RAID arrays. Aug 30 22:09:08 fcshome kernel: md: autorun ... Aug 30 22:09:08 fcshome kernel: md: considering sdb2 ... Aug 30 22:09:08 fcshome kernel: md: adding sdb2 ... Aug 30 22:09:08 fcshome kernel: md: sdb1 has different UUID to sdb2 Aug 30 22:09:08 fcshome kernel: md: adding sda2 ... Aug 30 22:09:08 fcshome kernel: md: sda1 has different UUID to sdb2 Aug 30 22:09:08 fcshome kernel: md: created md1 Aug 30 22:09:08 fcshome kernel: md: bind<sda2> Aug 30 22:09:08 fcshome kernel: md: bind<sdb2> Aug 30 22:09:08 fcshome kernel: md: running: <sdb2><sda2> Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda2 from array! Aug 30 22:09:08 fcshome kernel: md: unbind<sda2> Aug 30 22:09:08 fcshome kernel: md: export_rdev(sda2) Aug 30 22:09:08 fcshome kernel: raid1: raid set md1 active with 1 out of 2 mirro rs Aug 30 22:09:08 fcshome kernel: md: considering sdb1 ... Aug 30 22:09:08 fcshome kernel: md: adding sdb1 ... Aug 30 22:09:08 fcshome kernel: md: adding sda1 ... Aug 30 22:09:08 fcshome kernel: md: created md0 Aug 30 22:09:08 fcshome kernel: md: bind<sda1> Aug 30 22:09:08 fcshome kernel: md: bind<sdb1> Aug 30 22:09:08 fcshome kernel: md: running: <sdb1><sda1> Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda1 from array! Aug 30 22:09:08 fcshome kernel: md: unbind<sda1> Aug 30 22:09:08 fcshome kernel: md: export_rdev(sda1) Aug 30 22:09:08 fcshome kernel: raid1: raid set md0 active with 1 out of 2 mirro rs Aug 30 22:09:08 fcshome kernel: md: ... autorun DONE.
It looks like there is something wrong with sda... Your BIOS is booting grub from sda, grub is loading its conf, etc. from sda, but sda is not part of your raid sets of your running system. Your newer kernels are landing on sdb...
I *think* you can fix this by using mdadm to add (mdadm --add ...) sda and make it rebuild sda1 and sda2 from sdb1 and sdb2. You mav have to --fail and --remove it first.
Kind regards/ldv _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Tue, Aug 31, 2010 at 08:18:26AM -0400, Robert Heller wrote:
At Mon, 30 Aug 2010 22:24:19 -0400 CentOS mailing list centos@centos.org wrote:
On Mon, Aug 30, 2010 at 08:41:31PM -0500, Larry Vaden wrote:
On Mon, Aug 30, 2010 at 8:18 PM, fred smith fredex@fcshome.stoneham.ma.us wrote:
Below is some info that shows the problem. Can anyone here provide helpful suggestions on (1) why it is doing this, and more importantly (2) how I can make it stop?
Is there a chance /boot is full (read: are all the menu'd kernels actually present in /boot)?
(IIRC something similar happened because the /boot partition was set at the recommended size of 100 MB).
/boot doesn't appear to be full, there appear to be 25.2 megabytes free with 20.1 available.
another curious thing I just noticed is this: the list of kernels available at boot time (in the actual grub menu shown at boot) IS NOT THE SAME LIST THAT APPEARS IN GRUB.CONF. in the boot-time menu, the kernel it boots is the most recent one shown, and there are other older ones that do not appear in grub.conf. while in grub.conf there are several newer ones that do not appear on the boot-time grub menu.
most strange.
BTW, this is a raid-1 array using linux software raid, with two matching drives. Is there possibly some way the two drives could have gotten out of sync such that whichever one is the actual boot device has invalid info in /boot?
and while thinking along those lines, I see a number of mails in root's mailbox from "md" notifying us of a degraded array. these all appear to have happened, AFAICT, at system boot, over the last several months.
also, /var/log/messages contains a bunch of stuff like the below, also apparently at system boot, and I don't really know what it means, though the lines mentining a device being "kicked out" seem ominous:
Aug 30 22:09:08 fcshome kernel: device-mapper: uevent: version 1.0.3 Aug 30 22:09:08 fcshome kernel: device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel@redhat.com Aug 30 22:09:08 fcshome kernel: device-mapper: dm-raid45: initialized v0.2594l Aug 30 22:09:08 fcshome kernel: md: Autodetecting RAID arrays. Aug 30 22:09:08 fcshome kernel: md: autorun ... Aug 30 22:09:08 fcshome kernel: md: considering sdb2 ... Aug 30 22:09:08 fcshome kernel: md: adding sdb2 ... Aug 30 22:09:08 fcshome kernel: md: sdb1 has different UUID to sdb2 Aug 30 22:09:08 fcshome kernel: md: adding sda2 ... Aug 30 22:09:08 fcshome kernel: md: sda1 has different UUID to sdb2 Aug 30 22:09:08 fcshome kernel: md: created md1 Aug 30 22:09:08 fcshome kernel: md: bind<sda2> Aug 30 22:09:08 fcshome kernel: md: bind<sdb2> Aug 30 22:09:08 fcshome kernel: md: running: <sdb2><sda2> Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda2 from array! Aug 30 22:09:08 fcshome kernel: md: unbind<sda2> Aug 30 22:09:08 fcshome kernel: md: export_rdev(sda2) Aug 30 22:09:08 fcshome kernel: raid1: raid set md1 active with 1 out of 2 mirro rs Aug 30 22:09:08 fcshome kernel: md: considering sdb1 ... Aug 30 22:09:08 fcshome kernel: md: adding sdb1 ... Aug 30 22:09:08 fcshome kernel: md: adding sda1 ... Aug 30 22:09:08 fcshome kernel: md: created md0 Aug 30 22:09:08 fcshome kernel: md: bind<sda1> Aug 30 22:09:08 fcshome kernel: md: bind<sdb1> Aug 30 22:09:08 fcshome kernel: md: running: <sdb1><sda1> Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda1 from array! Aug 30 22:09:08 fcshome kernel: md: unbind<sda1> Aug 30 22:09:08 fcshome kernel: md: export_rdev(sda1) Aug 30 22:09:08 fcshome kernel: raid1: raid set md0 active with 1 out of 2 mirro rs Aug 30 22:09:08 fcshome kernel: md: ... autorun DONE.
It looks like there is something wrong with sda... Your BIOS is booting grub from sda, grub is loading its conf, etc. from sda, but sda is not part of your raid sets of your running system. Your newer kernels are landing on sdb...
yeah, that sounds like a possibility.
I *think* you can fix this by using mdadm to add (mdadm --add ...) sda and make it rebuild sda1 and sda2 from sdb1 and sdb2. You mav have to --fail and --remove it first.
I think you may be right. I'll give that a whirl at first opportunity.
After posting this last night I did further digging and found that the particular drives I'm using in this raid array are known to have long timeouts, causing raid controllers (though I don't know if that includes Linux's software RAID or not) to become confused and fail the mirror/ drive when the timeout gets too long. There's apparently a WD utility (these are WD drives) to change a setting for that (the utility is wdtler.exe and the drive property is called TLER) which allegedly solves the problem. Other posters have pointed out that the newer drives OF THE SAME MODEL no longer let you set that. I haven't yet had the chance to find out if my drives allow it to be changed or not, but since they're somewhat over a year old I'm hopeful. Soon, I hope. Looks like I need to find a way to make a DOS bootable floppy (then add a floppy drive to the machine) so I can boot it up and give it a try.
Poking around with smartctl indicates NO drive errors on either drive, so I'm hopeful that the problem is "simply" as described above.
If I can't change the setting I may have to replace the drives. :( the entire REASON for buying two drives was so I would have some safety. doggone drive manufacturers!
fred smith wrote:
On Tue, Aug 31, 2010 at 08:18:26AM -0400, Robert Heller wrote:
At Mon, 30 Aug 2010 22:24:19 -0400 CentOS mailing list centos@centos.org wrote:
On Mon, Aug 30, 2010 at 08:41:31PM -0500, Larry Vaden wrote:
On Mon, Aug 30, 2010 at 8:18 PM, fred smith fredex@fcshome.stoneham.ma.us wrote:
<snip>
drive when the timeout gets too long. There's apparently a WD utility (these are WD drives) to change a setting for that (the utility is wdtler.exe and the drive property is called TLER) which allegedly solves the problem. Other posters have pointed out that the newer drives OF THE SAME MODEL no longer let you set that. I haven't yet had the chance to find out if my drives allow it to be changed or not, but since they're somewhat over a year old I'm hopeful. Soon, I hope. Looks like I need to find a way to make a DOS bootable floppy (then add a floppy drive to the machine) so I can boot it up and give it a try.
<snip> a) Try contacting WD, and ask them what they have for Linux for this purpose, and b) FreeDos.
mark
On Tue, 31 Aug 2010, fred smith wrote:
To: CentOS mailing list centos@centos.org From: fred smith fredex@fcshome.stoneham.ma.us Subject: Re: [CentOS] Centos 5.5, not booting latest kernel but older one instead
On Tue, Aug 31, 2010 at 08:18:26AM -0400, Robert Heller wrote:
At Mon, 30 Aug 2010 22:24:19 -0400 CentOS mailing list centos@centos.org wrote:
On Mon, Aug 30, 2010 at 08:41:31PM -0500, Larry Vaden wrote:
On Mon, Aug 30, 2010 at 8:18 PM, fred smith fredex@fcshome.stoneham.ma.us wrote:
Below is some info that shows the problem. Can anyone here provide helpful suggestions on (1) why it is doing this, and more importantly (2) how I can make it stop?
Is there a chance /boot is full (read: are all the menu'd kernels actually present in /boot)?
(IIRC something similar happened because the /boot partition was set at the recommended size of 100 MB).
/boot doesn't appear to be full, there appear to be 25.2 megabytes free with 20.1 available.
another curious thing I just noticed is this: the list of kernels available at boot time (in the actual grub menu shown at boot) IS NOT THE SAME LIST THAT APPEARS IN GRUB.CONF. in the boot-time menu, the kernel it boots is the most recent one shown, and there are other older ones that do not appear in grub.conf. while in grub.conf there are several newer ones that do not appear on the boot-time grub menu.
most strange.
BTW, this is a raid-1 array using linux software raid, with two matching drives. Is there possibly some way the two drives could have gotten out of sync such that whichever one is the actual boot device has invalid info in /boot?
and while thinking along those lines, I see a number of mails in root's mailbox from "md" notifying us of a degraded array. these all appear to have happened, AFAICT, at system boot, over the last several months.
also, /var/log/messages contains a bunch of stuff like the below, also apparently at system boot, and I don't really know what it means, though the lines mentining a device being "kicked out" seem ominous:
Aug 30 22:09:08 fcshome kernel: device-mapper: uevent: version 1.0.3 Aug 30 22:09:08 fcshome kernel: device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel@redhat.com Aug 30 22:09:08 fcshome kernel: device-mapper: dm-raid45: initialized v0.2594l Aug 30 22:09:08 fcshome kernel: md: Autodetecting RAID arrays. Aug 30 22:09:08 fcshome kernel: md: autorun ... Aug 30 22:09:08 fcshome kernel: md: considering sdb2 ... Aug 30 22:09:08 fcshome kernel: md: adding sdb2 ... Aug 30 22:09:08 fcshome kernel: md: sdb1 has different UUID to sdb2 Aug 30 22:09:08 fcshome kernel: md: adding sda2 ... Aug 30 22:09:08 fcshome kernel: md: sda1 has different UUID to sdb2 Aug 30 22:09:08 fcshome kernel: md: created md1 Aug 30 22:09:08 fcshome kernel: md: bind<sda2> Aug 30 22:09:08 fcshome kernel: md: bind<sdb2> Aug 30 22:09:08 fcshome kernel: md: running: <sdb2><sda2> Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda2 from array! Aug 30 22:09:08 fcshome kernel: md: unbind<sda2> Aug 30 22:09:08 fcshome kernel: md: export_rdev(sda2) Aug 30 22:09:08 fcshome kernel: raid1: raid set md1 active with 1 out of 2 mirro rs Aug 30 22:09:08 fcshome kernel: md: considering sdb1 ... Aug 30 22:09:08 fcshome kernel: md: adding sdb1 ... Aug 30 22:09:08 fcshome kernel: md: adding sda1 ... Aug 30 22:09:08 fcshome kernel: md: created md0 Aug 30 22:09:08 fcshome kernel: md: bind<sda1> Aug 30 22:09:08 fcshome kernel: md: bind<sdb1> Aug 30 22:09:08 fcshome kernel: md: running: <sdb1><sda1> Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda1 from array! Aug 30 22:09:08 fcshome kernel: md: unbind<sda1> Aug 30 22:09:08 fcshome kernel: md: export_rdev(sda1) Aug 30 22:09:08 fcshome kernel: raid1: raid set md0 active with 1 out of 2 mirro rs Aug 30 22:09:08 fcshome kernel: md: ... autorun DONE.
It looks like there is something wrong with sda... Your BIOS is booting grub from sda, grub is loading its conf, etc. from sda, but sda is not part of your raid sets of your running system. Your newer kernels are landing on sdb...
yeah, that sounds like a possibility.
I *think* you can fix this by using mdadm to add (mdadm --add ...) sda and make it rebuild sda1 and sda2 from sdb1 and sdb2. You mav have to --fail and --remove it first.
I think you may be right. I'll give that a whirl at first opportunity.
After posting this last night I did further digging and found that the particular drives I'm using in this raid array are known to have long timeouts, causing raid controllers (though I don't know if that includes Linux's software RAID or not) to become confused and fail the mirror/ drive when the timeout gets too long. There's apparently a WD utility (these are WD drives) to change a setting for that (the utility is wdtler.exe and the drive property is called TLER) which allegedly solves the problem. Other posters have pointed out that the newer drives OF THE SAME MODEL no longer let you set that. I haven't yet had the chance to find out if my drives allow it to be changed or not, but since they're somewhat over a year old I'm hopeful. Soon, I hope. Looks like I need to find a way to make a DOS bootable floppy (then add a floppy drive to the machine) so I can boot it up and give it a try.
Poking around with smartctl indicates NO drive errors on either drive, so I'm hopeful that the problem is "simply" as described above.
If I can't change the setting I may have to replace the drives. :( the entire REASON for buying two drives was so I would have some safety. doggone drive manufacturers!
Hi Fred. Somewhat OT but maybe of interest to you.
I had to replace some WD drives after 3 years use.
One kept giving out SMART messages which I ignored, till the drive went AWOL. The other had no SMART error messages whatsoever.
That went down as well!
So I'm on Hitachi HDD now.
Reason being I had, and still have a Hitachi 2.5" drive in one of my laptops. The SMART test report looks really bad.
The drive makes ominous clunking sounds wnen in use. I have been expecting it to fail for some time, but it just keeps going!
Hitachi provide a DOS test program for their hard drives.
http://www.hitachigst.com/support/downloads/#DFT
The specs for the Deskstar looked good. Plus they have a 3 year warranty.
[root@karsites ~]# smartctl -a /dev/sda smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION === Model Family: Hitachi Deskstar P7K500 series Device Model: Hitachi HDP725050GLAT80 Serial Number: xxxxxxxxxxxxx Firmware Version: GM4OA42A User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Tue Aug 31 15:29:30 2010 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled
Keith
-
---- Fred Smith -- fredex@fcshome.stoneham.ma.us ----------------------------- The eyes of the Lord are everywhere, keeping watch on the wicked and the good. ----------------------------- Proverbs 15:3 (niv) ----------------------------- _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Mon, 30 Aug 2010, fred smith wrote:
To: centos@centos.org From: fred smith fredex@fcshome.stoneham.ma.us Subject: [CentOS] Centos 5.5, not booting latest kernel but older one instead
I've been going along not noticing things happening right under my nose. so imagine my surprise when I discovered last night that my Centos 5 box has installed multiple new kernels over the last few months, as updates come out, and IT IS NOT BOOTING THE NEWST ONE.
grub.conf says to boot kernel 0, and 0 is the newest one. but the one it actually boots is 6 or 8 down the list (clearly I've not been keeping things cleaned up, either).
Below is some info that shows the problem. Can anyone here provide helpful suggestions on (1) why it is doing this, and more importantly (2) how I can make it stop?
Thanks!
uname reports: 2.6.18-164.15.1.el5PAE #1 SMP Wed Mar 17 12:14:29 EDT 2010 i686 athlon i386 GNU/Linux
while /etc/grub.conf contains:
/etc/grub.conf ??
don't you mean /boot/grub/grub.conf ?
# grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,0) # kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00 # initrd /initrd-version.img #boot=/dev/md0 default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.18-194.11.1.el5PAE) root (hd0,0) kernel /vmlinuz-2.6.18-194.11.1.el5PAE ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-194.11.1.el5PAE.img title CentOS (2.6.18-194.11.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-194.11.1.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-194.11.1.el5.img
I would add a dummy entry to the end of grub.conf, something like this:
title Booting from "wherever I think GRUB and grub.conf is" root (hd0,0) kernel /vmlinuz-2.6.18-194.11.1.el5PAE ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M initrd /initrd-2.6.18-194.11.1.el5PAE.img
Hit the down arrow at boot time, and if you don't see that entry above, then grub is using a different /boot/grub/grub.conf file.
I don't use RAID so cannot comment specifically on that. If you re-install GRUB to a seperate boot partition, when there are any kernel updates, yum will not know where to look to find and edit the grub.conf file. This means you have total control over which kernel gets booted, even after a kernel upgrade.
HTH
Keith Roberts
---- Fred Smith -- fredex@fcshome.stoneham.ma.us ----------------------------- I can do all things through Christ who strengthens me. ------------------------------ Philippians 4:13 -------------------------------
So can I !!!
On Tue, 2010-08-31 at 07:23 +0100, Keith Roberts wrote:
On Mon, 30 Aug 2010, fred smith wrote:
To: centos@centos.org From: fred smith fredex@fcshome.stoneham.ma.us Subject: [CentOS] Centos 5.5, not booting latest kernel but older one instead
I've been going along not noticing things happening right under my nose. so imagine my surprise when I discovered last night that my Centos 5 box has installed multiple new kernels over the last few months, as updates come out, and IT IS NOT BOOTING THE NEWST ONE.
grub.conf says to boot kernel 0, and 0 is the newest one. but the one it actually boots is 6 or 8 down the list (clearly I've not been keeping things cleaned up, either).
Below is some info that shows the problem. Can anyone here provide helpful suggestions on (1) why it is doing this, and more importantly (2) how I can make it stop?
Thanks!
uname reports: 2.6.18-164.15.1.el5PAE #1 SMP Wed Mar 17 12:14:29 EDT 2010 i686 athlon i386 GNU/Linux
while /etc/grub.conf contains:
/etc/grub.conf ??
don't you mean /boot/grub/grub.conf ?
Actually /etc/grub.conf should be an link to /boo/grub/grub.conf, so yes, OP correctly pasted the content of /etc/grub.conf
ls -las /etc/grub.conf 4 lrwxrwxrwx 1 root root 22 Apr 2 2009 /etc/grub.conf -> ../boot/grub/grub.conf
I suggest setting an higher timeout (eg 15 sec) and disable hiddenmenu. Then try to manually select last kernel out of the list. And oh, first of all, you should check whether the /boot/grub/grub.conf links to /etc/grub.conf :)
HTH
On Tue, 31 Aug 2010, kalinix wrote:
To: CentOS mailing list centos@centos.org From: kalinix calin.kalinix.cosma@gmail.com Subject: Re: [CentOS] Centos 5.5, not booting latest kernel but older one instead
On Tue, 2010-08-31 at 07:23 +0100, Keith Roberts wrote:
On Mon, 30 Aug 2010, fred smith wrote:
To: centos@centos.org From: fred smith fredex@fcshome.stoneham.ma.us Subject: [CentOS] Centos 5.5, not booting latest kernel but older one instead
I've been going along not noticing things happening right under my nose. so imagine my surprise when I discovered last night that my Centos 5 box has installed multiple new kernels over the last few months, as updates come out, and IT IS NOT BOOTING THE NEWST ONE.
grub.conf says to boot kernel 0, and 0 is the newest one. but the one it actually boots is 6 or 8 down the list (clearly I've not been keeping things cleaned up, either).
Below is some info that shows the problem. Can anyone here provide helpful suggestions on (1) why it is doing this, and more importantly (2) how I can make it stop?
Thanks!
uname reports: 2.6.18-164.15.1.el5PAE #1 SMP Wed Mar 17 12:14:29 EDT 2010 i686 athlon i386 GNU/Linux
while /etc/grub.conf contains:
/etc/grub.conf ??
don't you mean /boot/grub/grub.conf ?
Actually /etc/grub.conf should be an link to
/boot/grub/grub.conf, so
yes, OP correctly pasted the content of /etc/grub.conf
Yes, thanks for that.
ls -las /etc/grub.conf 4 lrwxrwxrwx 1 root root 22 Apr 2 2009 /etc/grub.conf -> ../boot/grub/grub.conf
I suggest setting an higher timeout (eg 15 sec) and disable hiddenmenu. Then try to manually select last kernel out of the list. And oh, first of all, you should check whether the /boot/grub/grub.conf links to /etc/grub.conf :)
It is possible that grub is booting from the md0 array, and there may be a grub.conf on there as well, that ls did not find?
The /boot partition does not have to be mounted for GRUB to boot from it.
Maybe installing Gparted and looking at your partitions would give us a clue?
HTH
Keith
On Tue, 31 Aug 2010, Keith Roberts wrote:
On Mon, 30 Aug 2010, fred smith wrote:
To: centos@centos.org From: fred smith fredex@fcshome.stoneham.ma.us Subject: [CentOS] Centos 5.5, not booting latest kernel but older one instead
snip
# grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg. # root (hd0,0) # kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00 # initrd /initrd-version.img
#boot=/dev/md0
So anaconda is reporting there is a seperate boot partition, on the md0 RAID array?
It's possible that GRUB is set up to boot from this seperate boot partition that is unmounted, and is reading the grub.conf from there?
While you somehow also have a /boot/grub directory with GRUB and grub.conf installed there as well?
I would mount the /boot partition on md0 and have a poke round there - see if you have another GRUB and grub.conf installed there that is being used?
If so, add another dummy entry to that, telling you where that grub.conf file is, for diagnostics at boot time.
As mentioned by someone else, you really need hiddenmenu to be commented out, so you can see where GRUB is getting it's grub.conf file from.
Kind Regards,
Keith