[CentOS-devel] Balancing the needs around the RHEL platform

Thu Dec 24 13:21:53 UTC 2020
Phil Perry <pperry at elrepo.org>

On 24/12/2020 00:14, Neal Gompa wrote:
> On Wed, Dec 23, 2020 at 4:38 PM Phil Perry <pperry at elrepo.org> wrote:
>>
>> On 23/12/2020 20:50, Matthew Miller wrote:
>>> On Wed, Dec 23, 2020 at 08:23:29PM +0000, Phil Perry wrote:
>>>> Take Wireguard VPN as an example. No sooner than upstream fixed the
>>>> breakage caused by -257 on Monday, -259 landed and broke it
>>>> again[2].
>>>
>>>
>>> It seems like Wireguard might be a good example of something for an
>>> alternate kernel maintained by a SIG. (Like the Xen SIG does.)
>>>
>>
>> Why would you do that? The method we use in Enterprise Linux to deliver
>> 3rd party out-of-tree drivers is the RHEL Driver Update Programme. It
>> has been this way for over a decade. It works really well. It just
>> doesn't work for Stream because the Stream kernel is not suitable for
>> end user (Enterprise) consumption - it is a development kernel for
>> developing the next RHEL point release.
>>
>> If Red Hat really wanted to fix this in (a) kernel, the solution would
>> have been to accept the repeated upstream requests to backport the
>> driver into the RHEL kernel, but that idea/request has been rejected.
>>
> 
> No. The correct fix here is to start blocking RHEL kernel updates
> against third-party Free Software kernel module packages to ensure
> compatibility isn't broken and the kernel ABI stops breaking on every
> kernel version series. The reason it keeps breaking is because there's
> no current mechanism in which these are tested together to validate
> them for release.
> 

Blocking Stream kernel updates you mean?

That would certainly be an option, and I have written a yum plugin (for 
el7) that does the reverse and masks kmod packages from the yum 
transaction where the required kernel is not available yet. But for such 
an approach to work, it is essential that the Stream repository contains 
all kernel releases, not just the latest as is the case at present.

Further, we have an issue with the Stream installation images which are 
constantly being updates during the latest compose and feature the 
latest Stream kernel - these are unable to use Driver Update Disk images 
(DUDs) which are generally built around the point release GA kernel and 
are likely not compatible with newer Stream kernels.

> The LF/RH/SUSE kernel module packaging system (branded as the Driver
> Update Program by Red Hat) relies on one of two things happening to be
> reasonably successful:
> 
> * Gating to ensure kABI doesn't break (RHEL-style)
> * Continuous automatic rebuilds as the kABI changes (SUSE-style)
> 
> At work, we've internally implemented the SUSE-style strategy with our
> RHEL kernel module builds, but we're able to do that because our build
> system is designed to handle that. Within the CentOS Project with
> CKI/ARK and CentOS Stream, we should be implementing the RHEL-style
> strategy.
> 
> More than most, I get why you're upset about the kABI always breaking
> as kernel updates push out, but instead of just saying "it's not
> suitable", we should be building solutions to *make* it suitable for
> the Enterprise. It's *bad* that the RHEL kernel breaks its own
> promises so often (which is a relatively new thing, in my experience),
> and we should be implementing safeguards to stop it from happening
> going forward.
> 
> 

To be fair to Red Hat, they are not breaking their own promises (nor 
even the kABI by their own definition) as Red Hat only strive to retain 
kABI compatibility for symbols on their own defined whitelist.

What happens in reality (especially in the first 5 years during the 
active development phase or Stream phase) is that Red Hat branch the 
RHEL kernel at point release time and the 8.3 kernel, for example, stays 
stable for 6 months with only important bug fix and security fixes, but 
no new features whilst the RHEL development kernel branch for 8.4, which 
is now being released to Stream, gets all the big backports that will be 
in the 8.4 kernel, and those backports are what causes breakage of 
symbols that are not on the kABI whitelist but are used in the real 
world by many/most 3rd party drivers.

It is really important that this process happens. If it didn't, we 
wouldn't for example get a new WiFi stack in RHEL8.1 backported from 
kernel-5.2 or in RHEL8.3 backported from kernel-5.7 and none of our 
fancy new WiFi adapters would work.

It's also really important that this process is being opened up if Red 
Hat want people (the community) outside of the Red Hat kernel 
development team to be able to contribute to it.

So I'm absolutely not against it and nor do I want to prevent or stop it 
from happening. Quite the opposite - I am really looking forward to the 
day I can contribute simple fixes to the RHEL kernel rather than having 
to file a bug and wait months/years to see the incorporation of a simple 
upstream fix or have to open a support case and spend months dealing 
people that do not understand the issue. But above all I just want 
people to recognise that this is a *development* system and stop trying 
to tell people that is it a drop in replacement for CentOS Linux because 
it is not.