Hi all,
I'm having a strange problem in which a certain box won't boot any kernel newer than 2.6.18-53.
I have a kickstart setup that installs a CentOS 5.1 base (which comes with kernel 2.6.18-53), and then I do a "yum update" to 5.3.
However, when 2.6.18-164 gets installed, the box is rebooted, and it dumps me in a grub prompt. If I manually enter root, kernel, initrd and boot commands and point it at the -164 kernel, I am given "Error 13: invalid or unsupported executable format."
But I can manually enter the -53 kernel and that boots fine.
I tried backing down to the -128 and same problem.
(all our updates come from our own mirrored repos and we only sync i386 directories)
I enabled the menu and selected -164, and it gave me Error 13 the first time, and subsequent selects of the menu item resulted in Error 15: File not found.
Now all of the above was with a separate /boot partition.
Box is a dual core Xeon (E8500) with hardware SATA RAID on board.
I'm pretty sure the old boxes were the same spec (but I don't have them onsite so can't verify that) and they successfully booted the -128 kernel.
I've since rebuilt it with a single / partition. I then did all the updates, and updated my grub.conf to the following:
default=0 timeout=5 splashimage=(hd0,0)/boot/grub/splash.xpm.gz #hiddenmenu title CentOS (2.6.18-53.el5) root (hd0,0) kernel /boot/vmlinuz-2.6.18-53.el5 ro root=LABEL=/ initrd /boot/initrd-2.6.18-53.el5.img title CentOS (2.6.18-164.el5) root (hd0,0) kernel /boot/vmlinuz-2.6.18-164.el5 ro root=LABEL=/ initrd /boot/initrd-2.6.18-164.el5.img
The menu came up but only showed the -53 option.
The most common answer I seem to find with Grub's Error 13 is that a different architecture kernel was installed, but they're all the same; ie; i686 kernel from i386 repo.
[root@dhcp-248 boot]# file vmlinuz-2.6.18-53.el5 vmlinuz-2.6.18-53.el5: ELF 32-bit LSB shared object, Intel 80386, version 1, stripped [root@dhcp-248 boot]# file vmlinuz-2.6.18-164.el5 vmlinuz-2.6.18-164.el5: ELF 32-bit LSB shared object, Intel 80386, version 1, stripped
current running kernel is:
Linux dhcp-248.off.knossos.net.nz 2.6.18-53.el5 #1 SMP Mon Nov 12 02:22:48 EST 2007 i686 i686 i386 GNU/Linux
[root@dhcp-248 ~]# rpm -qa | grep kernel kernel-2.6.18-53.el5 kernel-2.6.18-164.el5 [root@dhcp-248 ~]# yum list kernel* Loaded plugins: fastestmirror Determining fastest mirrors knl-base 2508/2508 knl-gen 42/42 knl-updates 528/528 Installed Packages kernel.i686 2.6.18-53.el5 installed kernel.i686 2.6.18-164.el5 installed Available Packages kernel-PAE.i686 2.6.18-164.el5 knl-updates kernel-PAE-devel.i686 2.6.18-164.el5 knl-updates kernel-debug.i686 2.6.18-164.el5 knl-updates kernel-debug-devel.i686 2.6.18-164.el5 knl-updates kernel-devel.i686 2.6.18-164.el5 knl-updates kernel-doc.noarch 2.6.18-164.el5 knl-updates kernel-headers.i386 2.6.18-164.el5 knl-updates kernel-xen.i686 2.6.18-164.el5 knl-updates kernel-xen-devel.i686 2.6.18-164.el5 knl-updates
[root@dhcp-248 ~]# cat /etc/fstab LABEL=/ / ext3 defaults 1 1 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 LABEL=SWAP-isw_dgcfec swap swap defaults 0 0 [root@dhcp-248 ~]# fdisk -l
Disk /dev/sda: 251.0 GB, 251059544064 bytes 255 heads, 63 sectors/track, 30522 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System /dev/sda1 * 1 30391 244115676 83 Linux /dev/sda2 30392 30522 1052257+ 82 Linux swap / Solaris
Disk /dev/sdb: 251.0 GB, 251059544064 bytes 255 heads, 63 sectors/track, 30522 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System /dev/sdb1 * 1 30391 244115676 83 Linux /dev/sdb2 30392 30522 1052257+ 82 Linux swap / Solaris [root@dhcp-248 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 226G 1.3G 213G 1% / tmpfs 504M 0 504M 0% /dev/shm
Spiro Harvey wrote:
Box is a dual core Xeon (E8500) with hardware SATA RAID on board.
The E8500 is a desktop Core2Duo CPU, I thought? 'what sort of Hardware SATA RAID? Do you mean, Intel Matrix Raid? thats not actually hardware, thats BIOS/driver implemented fake raid, and frankly, you'd be better off using native linux mdraid.
Or did you mean the older E8500 server chipset, for the Xeon MP 70x0 or 71x0 series (p4 prescott based) CPUs in a server with proper onboard raid such as a HP SmartArray or Dell PERC ?
If this latter, never mind what I said above... needless to say, these part numbers can be very confusing.
John R Pierce pierce@hogranch.com wrote:
The E8500 is a desktop Core2Duo CPU, I thought? 'what sort of
Yes, my mistake. It's a Core 2 Duo.
I don't know where I saw the Xeon sticker. I saw the E8500 on /proc/cpuinfo but didn't RTFS properly. :/ This was further confused when I googled E8500 and one of the hits mentioned Xeon...
The raid is indeed an Intel Matrix RAID. The BIOS is configured so that the sata controller is in RAID mode, and the "OPROM" is set to Matrix Raid.
The raid is indeed an Intel Matrix RAID. The BIOS is configured so that the sata controller is in RAID mode, and the "OPROM" is set to Matrix Raid.
I would backup ALL your file systems off that disk, perhaps using a Linux rescue CD, then configure the controller in the BIOS for JBOD, use a rescue disk to build mdraid partitions, and restore your files from the backups. you may have to rebuild the /boot/initrd on the system to dump the fakeraid (dmraid) driver and enable the mdraid native linux raid driver
Fake Raid like Intel Matrix Raid is NOT recommended for linux/unix systems http://thebs413.blogspot.com/2005/09/fake-raid-fraid-sucks-even-more-at.html
someone's procedure for undoing a fakeraid. http://www.brandonchecketts.com/archives/disabling-dmraid-fakeraid-on-centos...
I would backup ALL your file systems off that disk, perhaps using a
This is a fresh install, so that's not an issue.
Linux rescue CD, then configure the controller in the BIOS for JBOD, use a rescue disk to build mdraid partitions, and restore your files from the backups. you may have to rebuild the /boot/initrd on the system to dump the fakeraid (dmraid) driver and enable the mdraid native linux raid driver
I'm interested in knowing why the machine isn't booting some kernels, but will happily boot another. I figure if it's a hardware issue, then it should be an all-or-nothing issue? I'm positive this is the same spec as the last servers built for this same purpose, but the others are now on the other side of the country, so I can't access them to verify.
So assuming the hardware is exactly the same, and assuming there's something in the -164 kernel that doesn't like that particular fake raid card, then I still can't see why I can't boot the -128 kernel as that's what the other boxes have running. :/
On Nov 12, 2009, at 7:53 PM, Spiro Harvey spiro@knossos.net.nz wrote:
I would backup ALL your file systems off that disk, perhaps using a
This is a fresh install, so that's not an issue.
Linux rescue CD, then configure the controller in the BIOS for JBOD, use a rescue disk to build mdraid partitions, and restore your files from the backups. you may have to rebuild the /boot/initrd on the system to dump the fakeraid (dmraid) driver and enable the mdraid native linux raid driver
I'm interested in knowing why the machine isn't booting some kernels, but will happily boot another. I figure if it's a hardware issue, then it should be an all-or-nothing issue? I'm positive this is the same spec as the last servers built for this same purpose, but the others are now on the other side of the country, so I can't access them to verify.
So assuming the hardware is exactly the same, and assuming there's something in the -164 kernel that doesn't like that particular fake raid card, then I still can't see why I can't boot the -128 kernel as that's what the other boxes have running. :/
You might have installed a driver for the fake raid before which added it to /etc/modprobe.conf and did a mkinitrd to add it to the initrd during boot, but at some point removed it and from that point on newer kernels didn't get the driver in their initrd images?
Just an idea.
-Ross
On Thu, Nov 12, 2009 at 9:22 PM, Ross Walker rswwalker@gmail.com wrote:
On Nov 12, 2009, at 7:53 PM, Spiro Harvey spiro@knossos.net.nz wrote:
I would backup ALL your file systems off that disk, perhaps using a
This is a fresh install, so that's not an issue.
Linux rescue CD, then configure the controller in the BIOS for JBOD, use a rescue disk to build mdraid partitions, and restore your files from the backups. you may have to rebuild the /boot/initrd on the system to dump the fakeraid (dmraid) driver and enable the mdraid native linux raid driver
I'm interested in knowing why the machine isn't booting some kernels, but will happily boot another. I figure if it's a hardware issue, then it should be an all-or-nothing issue? I'm positive this is the same spec as the last servers built for this same purpose, but the others are now on the other side of the country, so I can't access them to verify.
So assuming the hardware is exactly the same, and assuming there's something in the -164 kernel that doesn't like that particular fake raid card, then I still can't see why I can't boot the -128 kernel as that's what the other boxes have running. :/
You might have installed a driver for the fake raid before which added it to /etc/modprobe.conf and did a mkinitrd to add it to the initrd during boot, but at some point removed it and from that point on newer kernels didn't get the driver in their initrd images?
If the /boot is also part of the raid and it is a soft raid (fake raid is the same) then maybe only one of the mirror is being updated and grub is looking at the other mirror and not finding the files needed.
Larry Brigman larry.brigman@gmail.com wrote:
If the /boot is also part of the raid and it is a soft raid (fake raid is the same) then maybe only one of the mirror is being updated and grub is looking at the other mirror and not finding the files needed.
I think you're on the right track here. I dropped the raid set, and rebuilt the box, and this time took note of the syncing.. dmraid -s kept telling me the mirror was "ok" so I'm guessing it synced correctly.
I installed the update again, then set -53 to boot first in the grub order, but it dumped me at the grub prompt.
So I typed "kernel 2.6.18-" and hit tab, and saw both files (-53 and -164). Hit tab again, and it completed -53. I went back and typed -164 and selected that. First time Error 13 (unknown executable format). I reran the kernel line and this time was told Error 15: File Not Found. Ran it again and got error 13. It pretty much alternated.
So it looks like one side of the mirror isn't getting synced properly.
There are 7 other boxen for which this has worked, so it's possible this one is just faulty.
I'm also going to try Rob's idea of "nodmraid" to see what happens there.
Appreciate all the help.
Spiro Harvey wrote:
I would backup ALL your file systems off that disk, perhaps using a
This is a fresh install, so that's not an issue.
Linux rescue CD, then configure the controller in the BIOS for JBOD, use a rescue disk to build mdraid partitions, and restore your files from the backups. you may have to rebuild the /boot/initrd on the system to dump the fakeraid (dmraid) driver and enable the mdraid native linux raid driver
I'm interested in knowing why the machine isn't booting some kernels, but will happily boot another. I figure if it's a hardware issue, then it should be an all-or-nothing issue? I'm positive this is the same spec as the last servers built for this same purpose, but the others are now on the other side of the country, so I can't access them to verify.
So assuming the hardware is exactly the same, and assuming there's something in the -164 kernel that doesn't like that particular fake raid card, then I still can't see why I can't boot the -128 kernel as that's what the other boxes have running. :/
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Spiro, I had a similar problem with an Intel MB shifting from -53 kernel to newer and ended up with adding "nodmraid" to the kernel line in grub so I could actually use the drives. For some reason no BIOS setting would set the onboard fake raid into a mode that the kernel could deal with. Suggest you do the back up and re-install with mdraid - has worked like a charm since I did this. HTH Rob