On Wed, Dec 23, 2020 at 8:27 PM Neal Gompa ngompa13@gmail.com wrote:
On Wed, Dec 23, 2020 at 7:43 PM Mark Mielke mark.mielke@gmail.com wrote:
On Wed, Dec 23, 2020 at 7:15 PM Neal Gompa ngompa13@gmail.com wrote:
On Wed, Dec 23, 2020 at 4:38 PM Phil Perry pperry@elrepo.org wrote:
If Red Hat really wanted to fix this in (a) kernel, the solution would have been to accept the repeated upstream requests to backport the driver into the RHEL kernel, but that idea/request has been rejected.
No. The correct fix here is to start blocking RHEL kernel updates against third-party Free Software kernel module packages to ensure compatibility isn't broken and the kernel ABI stops breaking on every kernel version series. The reason it keeps breaking is because there's no current mechanism in which these are tested together to validate them for release.
I think you are correct. I also think there is a long-ish road to get here. :-) Overall, it would have the best long-term results. It requires everyone that has requirements, document their requirements as automated tests.
I'm more optimistic. The RHEL kernel for RHEL 9 is already being developed in Fedora ELN[0] through the Always Ready Kernel project[1]. As for RHEL 8 and CentOS Stream 8, we can wire up validation testing using the Zuul instance that the project had stood up as part of Fedora CI work. That infrastructure integrates with the CentOS Pagure server and we can do all kinds of interesting things with it. [0]: https://docs.fedoraproject.org/en-US/eln/ [1]: https://gitlab.com/cki-project/kernel-ark
All good information. Thank you. I don't know if we can all afford to invest here, but we should all be thinking about it as long as we play in the EL ecosystem.
But, it would put a damper on "new feature that needs large kernel ABI changes to cost effectively backport", such as the OverlayFS changes done in RHEL 7 as one of many such examples. The choice to use Linux 4.18 is particularly problematic, since it wasn't an LTS kernel. :-( 5 years is a long time to wait for new breaking features in the kernel.
I think we'd be in a better place aiming for it and merely reducing the number of times it breaks. Right now, kABI breaks pretty significantly on every single RHEL point release. And there have been *several* botched backports that have screwed up both the RHEL kernel API and kernel module builds. Even just cutting the number of times that these kinds of breakages happen in half would be a major win.
Yes, please. :-)
The kernel isn't the only place that breaks us - but the kernel is definitely a common place. It's possible that if this was resolved, that 50%+ of the compatibility issues between minor releases that affect us (and presumedly others) would be resolved.
We still have a few systems on RHEL 6.3, because they have a particular nVidia driver for doing GPU acceleration. The newer driver, or the newer kernel, even staying within the RHEL 6.x later minor releases, exhibits random failures. We're working through it - but, it's an example of where an improvement in this area would go a long way to enabling integration between EL and 3rd party drivers. The GPU calculations are important to be accurate, so the system stays alive in current state until we can find a way forwards.
To your comment about Red Hat not using LTS kernels, LTS kernels do not maintain kABI upstream either, so it doesn't save any effort for Red Hat one way or another. If anything, being based on an LTS kernel would do Red Hat less favors because they're under pressure to conform to something similar to the upstream LTS kernel. Since the upstream LTS kernel already doesn't match the RHEL kernel lifecycle and Red Hat engineers would wind up doing a bunch of work anyway for the kABI stabilization and live kernel patching features, the non-LTS kernels are strategically better because there's less churn in them and more long-term flexibility.
This is an interesting point. Oracle UEK R6, as used in OL 7 and OL 8, is tied to Linux 5.4.17, and I think the above might be a good summary of why this is a good thing. One part of this is to set the expectation for what kABI is supported. The other part is to gain access to newer features, and reduce the cost of maintaining a high quality back port that still adheres to this kABI. By deploying Oracle UEK R6 simultaneously to both OL 7 and OL 8, they have effectively separated the kernel from the OS distro, allowing for both elements to be achieved. This seems like something that could be useful to do for the broader EL community. However, from a RHEL/CentOS perspective, it seems like a new thing, which means vendors may now support Oracle UEK R6, or the RHEL Kernel, but they wouldn't be aware of such a new thing yet.
More than most, I get why you're upset about the kABI always breaking as kernel updates push out, but instead of just saying "it's not suitable", we should be building solutions to *make* it suitable for the Enterprise. It's *bad* that the RHEL kernel breaks its own promises so often (which is a relatively new thing, in my experience), and we should be implementing safeguards to stop it from happening going forward.
Yes. Although, in the mean-time...
Well, since nothing can really change now, I'm looking forward a bit and trying to see how we can take advantage of the situation to better the wider community. After all, the biggest virtue of a true open source community is the ability to adapt.
I will do something similar. I mostly watch this list (and many others) the last few years, but this change was something that I wanted to ensure was represented a little better. Not sure if I did a good job or not, but - once all the things that can be said are said, and the dust settles, it's exactly as you say. We will adapt. We'll take advantage of new opportunities and new capabilities, and we'll fill any gaps left over according to our needs.