[CentOS] [SOLVED] Kernel updates do not boot - always boots oldest kernel

Sat Mar 18 05:37:12 UTC 2023
Rob Kampen <rkampen at kampensonline.com>

Thanks all for your comments and suggestions.

The main fix for the topic fault was fixing a soft link to 
/boot/efi/EFI/centos/grubenv - this is the one location used by UEFI

It turns out that the update process for this file, when a new kernel is 
installed, uses /boot/grub2/grubenv.

In my case a /boot/grub2/grubenv.rpmnew updated soft link was pointing 
to the correct file in /boot/efi/EFI/centos/, the original(?) grubenv in 
/boot/grub2/ was being updated correctly, just that UEFI booting doesn't 
use any files in this location. Fixed the soft link and it now gets 
updated correctly. Thus I can use

GRUB_DEFAULT=saved

However my booting problems were a little more obscure.

The grub.cfg file menuentry stanza for each kernel was correct. The set 
root='mduuid/<UUID>' points to the /boot UUID where the vmlinuz files live.

Also the linuxefi /vmlinuz-3.10.0-1180 ..... has both '/boot' and '/' 
UUIDs included.

In my case, due to a manual migration from BIOS boot (MBR partition) to 
UEFI boot (GPT partition) on the server, plus a manual disc upgrade from 
a pair of RAID1 500GB HDD (MBR partitioned) to a pair of RAID1 3.4TB SSD 
(GPT partitioned), everything appeared to be working, BUT I left the old 
HDDs plugged in.

The old HDD only had the 36.2 kernel installed. All the updated kernels 
were correctly installed onto the new SSD. HOWEVER, due to the migration 
process I employed the UUID for the partitions were the same. Thus UEFI 
boot, prior to OS load by loading vmlinuz only knows about the visible 
UUID on the partition tables  - MDRAID hasn't loaded yet. Thus in my 
case the hardware had four storage devices (2x RAID1) all with the same 
UUID for /boot [ blkid is your friend ]. Unfortunately I didn't realize 
this, and thus the UEFI simply looked at the first drive with that UUID 
- one of the original HDD and the not SSDs which were being updated 
correctly.

Removed the old drives and presto, UEFI now sees the new /boot and loads 
the later kernels.

Not sure if this will help anyone else, had to track this one down by 
fully walking through the step by step UEFI boot process and 
understanding how grub2 updates are applied.

Once again, thanks for those that made suggestions, most of which I have 
used and pursued until I understood each step.

Shalom
Rob


On 15/03/23 20:32, Gianluca Cecchi wrote:
>>
>>> I have only changed GRUB_DEFAULT from "saved" to "0"
>>>
>>> I have also run
>>>
>>> /usr/sbin/grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
>> I may be wrong here but IIRC, using grub2-mkconfig as described in the
>> Grub docs didn't work for me when I tried to use it years ago.
>>
>> I think you have to find out what is done when installing kernels and try
>> to find out where it goes wrong in your case. When you look at 'rpm -q
>> --scripts kernel' you can see that new kernels are registered with the
>> script '/usr/sbin/new-kernel-pkg'. I suggest to analyze what it does
>> exactly. I think it calls 'grubby' to do further work...
>>
>> Regards,
>> Simon
>>
>>
> If not already done, you can also go through the official documentation
> page for working with Grub 2 on RH EL 7 and the different commands it is
> reporting, both for bios and UEFU based systems.:
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-working_with_the_grub_2_boot_loader
>
> Eventually trying and managing before with some commands on another UEFI
> based system/vm that is more practical to use for you, as the target one is
> a remote system, as you wrote
> HIH,
> Gianluca
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos