[CentOS] Centos 5.5, not booting latest kernel but older one instead

Tue Aug 31 13:57:21 UTC 2010
fred smith <fredex at fcshome.stoneham.ma.us>

On Tue, Aug 31, 2010 at 08:18:26AM -0400, Robert Heller wrote:
> At Mon, 30 Aug 2010 22:24:19 -0400 CentOS mailing list <centos at centos.org> wrote:
> 
> > 
> > On Mon, Aug 30, 2010 at 08:41:31PM -0500, Larry Vaden wrote:
> > > On Mon, Aug 30, 2010 at 8:18 PM, fred smith
> > > <fredex at fcshome.stoneham.ma.us> wrote:
> > > >
> > > > Below is some info that shows the problem. Can anyone here provide
> > > > helpful suggestions on (1) why it is doing this, and more importantly (2)
> > > > how I can make it stop?
> > > 
> > > Is there a chance /boot is full (read: are all the menu'd kernels
> > > actually present in /boot)?
> > > 
> > > (IIRC something similar happened because the /boot partition was set
> > > at the recommended size of 100 MB).
> > 
> > /boot doesn't appear to be full, there appear to be 25.2 megabytes free with 20.1 available.
> > 
> > another curious thing I just noticed is this: the list of kernels available 
> > at boot time (in the actual grub menu shown at boot) IS NOT THE SAME LIST
> > THAT APPEARS IN GRUB.CONF. in the boot-time menu, the kernel it boots is
> > the most recent one shown, and there are other older ones that do not
> > appear in grub.conf. while in grub.conf there are several newer ones that
> > do not appear on the boot-time grub menu.
> > 
> > most strange.
> > 
> > BTW, this is a raid-1 array using linux software raid, with two matching
> > drives. Is there possibly some way the two drives could have gotten out
> > of sync such that whichever one is the actual boot device has invalid
> > info in /boot?
> > 
> > and while thinking along those lines, I see a number of mails in root's
> > mailbox from "md" notifying us of a degraded array. these all appear to have
> > happened, AFAICT, at system boot, over the last several months.
> > 
> > also, /var/log/messages contains a bunch of stuff like the below, also 
> > apparently at system boot, and I don't really know what it means, though
> > the lines mentining a device being "kicked out" seem ominous:
> > 
> > Aug 30 22:09:08 fcshome kernel: device-mapper: uevent: version 1.0.3
> > Aug 30 22:09:08 fcshome kernel: device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) 
> > initialised: dm-devel at redhat.com
> > Aug 30 22:09:08 fcshome kernel: device-mapper: dm-raid45: initialized v0.2594l
> > Aug 30 22:09:08 fcshome kernel: md: Autodetecting RAID arrays.
> > Aug 30 22:09:08 fcshome kernel: md: autorun ...
> > Aug 30 22:09:08 fcshome kernel: md: considering sdb2 ...
> > Aug 30 22:09:08 fcshome kernel: md:  adding sdb2 ...
> > Aug 30 22:09:08 fcshome kernel: md: sdb1 has different UUID to sdb2
> > Aug 30 22:09:08 fcshome kernel: md:  adding sda2 ...
> > Aug 30 22:09:08 fcshome kernel: md: sda1 has different UUID to sdb2
> > Aug 30 22:09:08 fcshome kernel: md: created md1
> > Aug 30 22:09:08 fcshome kernel: md: bind<sda2>
> > Aug 30 22:09:08 fcshome kernel: md: bind<sdb2>
> > Aug 30 22:09:08 fcshome kernel: md: running: <sdb2><sda2>
> > Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda2 from array!
> > Aug 30 22:09:08 fcshome kernel: md: unbind<sda2>
> > Aug 30 22:09:08 fcshome kernel: md: export_rdev(sda2)
> > Aug 30 22:09:08 fcshome kernel: raid1: raid set md1 active with 1 out of 2 mirro
> > rs
> > Aug 30 22:09:08 fcshome kernel: md: considering sdb1 ...
> > Aug 30 22:09:08 fcshome kernel: md:  adding sdb1 ...
> > Aug 30 22:09:08 fcshome kernel: md:  adding sda1 ...
> > Aug 30 22:09:08 fcshome kernel: md: created md0
> > Aug 30 22:09:08 fcshome kernel: md: bind<sda1>
> > Aug 30 22:09:08 fcshome kernel: md: bind<sdb1>
> > Aug 30 22:09:08 fcshome kernel: md: running: <sdb1><sda1>
> > Aug 30 22:09:08 fcshome kernel: md: kicking non-fresh sda1 from array!
> > Aug 30 22:09:08 fcshome kernel: md: unbind<sda1>
> > Aug 30 22:09:08 fcshome kernel: md: export_rdev(sda1)
> > Aug 30 22:09:08 fcshome kernel: raid1: raid set md0 active with 1 out of 2 mirro
> > rs
> > Aug 30 22:09:08 fcshome kernel: md: ... autorun DONE.
> 
> It looks like there is something wrong with sda... Your BIOS is booting
> grub from sda, grub is loading its conf, etc. from sda, but sda is not
> part of your raid sets of your running system.  Your newer kernels are
> landing on sdb...

yeah, that sounds like a possibility.
> 
> I *think* you can fix this by using mdadm to add (mdadm --add ...)  sda
> and make it rebuild sda1 and sda2 from sdb1 and sdb2. You mav have to
> --fail and --remove it first.

I think you may be right. I'll give that a whirl at first opportunity.

After posting this last night I did further digging and found that the
particular drives I'm using in this raid array are known to have long
timeouts, causing raid controllers (though I don't know if that includes
Linux's software RAID or not) to become confused and fail the mirror/
drive when the timeout gets too long. There's apparently a WD utility
(these are WD drives) to change a setting for that (the utility is
wdtler.exe and the drive property is called TLER) which allegedly solves
the problem. Other posters have pointed out that the newer drives OF THE
SAME MODEL no longer let you set that. I haven't yet had the chance to
find out if my drives allow it to be changed or not, but since they're
somewhat over a year old I'm hopeful. Soon, I hope.  Looks like I need
to find a way to make a DOS bootable floppy (then add a floppy drive to
the machine) so I can boot it up and give it a try.

Poking around with smartctl indicates NO drive errors on either drive,
so I'm hopeful that the problem is "simply" as described above.

If I can't change the setting I may have to replace the drives. :(
the entire REASON for buying two drives was so I would have some safety.
doggone drive manufacturers!
-- 
---- Fred Smith -- fredex at fcshome.stoneham.ma.us -----------------------------
                      The eyes of the Lord are everywhere, 
                    keeping watch on the wicked and the good.
----------------------------- Proverbs 15:3 (niv) -----------------------------