On 24/12/2020 00:14, Neal Gompa wrote:
On Wed, Dec 23, 2020 at 4:38 PM Phil Perry pperry@elrepo.org wrote:
On 23/12/2020 20:50, Matthew Miller wrote:
On Wed, Dec 23, 2020 at 08:23:29PM +0000, Phil Perry wrote:
Take Wireguard VPN as an example. No sooner than upstream fixed the breakage caused by -257 on Monday, -259 landed and broke it again[2].
It seems like Wireguard might be a good example of something for an alternate kernel maintained by a SIG. (Like the Xen SIG does.)
Why would you do that? The method we use in Enterprise Linux to deliver 3rd party out-of-tree drivers is the RHEL Driver Update Programme. It has been this way for over a decade. It works really well. It just doesn't work for Stream because the Stream kernel is not suitable for end user (Enterprise) consumption - it is a development kernel for developing the next RHEL point release.
If Red Hat really wanted to fix this in (a) kernel, the solution would have been to accept the repeated upstream requests to backport the driver into the RHEL kernel, but that idea/request has been rejected.
No. The correct fix here is to start blocking RHEL kernel updates against third-party Free Software kernel module packages to ensure compatibility isn't broken and the kernel ABI stops breaking on every kernel version series. The reason it keeps breaking is because there's no current mechanism in which these are tested together to validate them for release.
Blocking Stream kernel updates you mean?
That would certainly be an option, and I have written a yum plugin (for el7) that does the reverse and masks kmod packages from the yum transaction where the required kernel is not available yet. But for such an approach to work, it is essential that the Stream repository contains all kernel releases, not just the latest as is the case at present.
Further, we have an issue with the Stream installation images which are constantly being updates during the latest compose and feature the latest Stream kernel - these are unable to use Driver Update Disk images (DUDs) which are generally built around the point release GA kernel and are likely not compatible with newer Stream kernels.
The LF/RH/SUSE kernel module packaging system (branded as the Driver Update Program by Red Hat) relies on one of two things happening to be reasonably successful:
- Gating to ensure kABI doesn't break (RHEL-style)
- Continuous automatic rebuilds as the kABI changes (SUSE-style)
At work, we've internally implemented the SUSE-style strategy with our RHEL kernel module builds, but we're able to do that because our build system is designed to handle that. Within the CentOS Project with CKI/ARK and CentOS Stream, we should be implementing the RHEL-style strategy.
More than most, I get why you're upset about the kABI always breaking as kernel updates push out, but instead of just saying "it's not suitable", we should be building solutions to *make* it suitable for the Enterprise. It's *bad* that the RHEL kernel breaks its own promises so often (which is a relatively new thing, in my experience), and we should be implementing safeguards to stop it from happening going forward.
To be fair to Red Hat, they are not breaking their own promises (nor even the kABI by their own definition) as Red Hat only strive to retain kABI compatibility for symbols on their own defined whitelist.
What happens in reality (especially in the first 5 years during the active development phase or Stream phase) is that Red Hat branch the RHEL kernel at point release time and the 8.3 kernel, for example, stays stable for 6 months with only important bug fix and security fixes, but no new features whilst the RHEL development kernel branch for 8.4, which is now being released to Stream, gets all the big backports that will be in the 8.4 kernel, and those backports are what causes breakage of symbols that are not on the kABI whitelist but are used in the real world by many/most 3rd party drivers.
It is really important that this process happens. If it didn't, we wouldn't for example get a new WiFi stack in RHEL8.1 backported from kernel-5.2 or in RHEL8.3 backported from kernel-5.7 and none of our fancy new WiFi adapters would work.
It's also really important that this process is being opened up if Red Hat want people (the community) outside of the Red Hat kernel development team to be able to contribute to it.
So I'm absolutely not against it and nor do I want to prevent or stop it from happening. Quite the opposite - I am really looking forward to the day I can contribute simple fixes to the RHEL kernel rather than having to file a bug and wait months/years to see the incorporation of a simple upstream fix or have to open a support case and spend months dealing people that do not understand the issue. But above all I just want people to recognise that this is a *development* system and stop trying to tell people that is it a drop in replacement for CentOS Linux because it is not.