[CentOS-devel] Balancing the needs around the RHEL platform

Thu Dec 24 02:25:24 UTC 2020
Mark Mielke <mark.mielke at gmail.com>

On Wed, Dec 23, 2020 at 8:27 PM Neal Gompa <ngompa13 at gmail.com> wrote:
> On Wed, Dec 23, 2020 at 7:43 PM Mark Mielke <mark.mielke at gmail.com> wrote:
> > On Wed, Dec 23, 2020 at 7:15 PM Neal Gompa <ngompa13 at gmail.com> wrote:
> > > On Wed, Dec 23, 2020 at 4:38 PM Phil Perry <pperry at elrepo.org> wrote:
> > > > If Red Hat really wanted to fix this in (a) kernel, the solution would
> > > > have been to accept the repeated upstream requests to backport the
> > > > driver into the RHEL kernel, but that idea/request has been rejected.
> > > No. The correct fix here is to start blocking RHEL kernel updates
> > > against third-party Free Software kernel module packages to ensure
> > > compatibility isn't broken and the kernel ABI stops breaking on every
> > > kernel version series. The reason it keeps breaking is because there's
> > > no current mechanism in which these are tested together to validate
> > > them for release.
> > I think you are correct. I also think there is a long-ish road to get
> > here. :-) Overall, it would have the best long-term results. It
> > requires everyone that has requirements, document their requirements
> > as automated tests.
> I'm more optimistic. The RHEL kernel for RHEL 9 is already being
> developed in Fedora ELN[0] through the Always Ready Kernel project[1].
> As for RHEL 8 and CentOS Stream 8, we can wire up validation testing
> using the Zuul instance that the project had stood up as part of
> Fedora CI work. That infrastructure integrates with the CentOS Pagure
> server and we can do all kinds of interesting things with it.
> [0]: https://docs.fedoraproject.org/en-US/eln/
> [1]: https://gitlab.com/cki-project/kernel-ark

All good information. Thank you. I don't know if we can all afford to
invest here, but we should all be thinking about it as long as we play
in the EL ecosystem.

> > But, it would put a damper on "new feature that needs  large kernel
> > ABI changes to cost effectively backport", such as the OverlayFS
> > changes done in RHEL 7 as one of many such examples. The choice to use
> > Linux 4.18 is particularly problematic, since it wasn't an LTS kernel.
> > :-(
> > 5 years is a long time to wait for new breaking features in the kernel.
> I think we'd be in a better place aiming for it and merely reducing
> the number of times it breaks. Right now, kABI breaks pretty
> significantly on every single RHEL point release. And there have been
> *several* botched backports that have screwed up both the RHEL kernel
> API and kernel module builds. Even just cutting the number of times
> that these kinds of breakages happen in half would be a major win.

Yes, please. :-)

The kernel isn't the only place that breaks us - but the kernel is
definitely a common place. It's possible that if this was resolved,
that 50%+ of the compatibility issues between minor releases that
affect us (and presumedly others) would be resolved.

We still have a few systems on RHEL 6.3, because they have a
particular nVidia driver for doing GPU acceleration. The newer driver,
or the newer kernel, even staying within the RHEL 6.x later minor
releases, exhibits random failures. We're working through it - but,
it's an example of where an improvement in this area would go a long
way to enabling integration between EL and 3rd party drivers. The GPU
calculations are important to be accurate, so the system stays alive
in current state until we can find a way forwards.

> To your comment about Red Hat not using LTS kernels, LTS kernels do
> not maintain kABI upstream either, so it doesn't save any effort for
> Red Hat one way or another. If anything, being based on an LTS kernel
> would do Red Hat less favors because they're under pressure to conform
> to something similar to the upstream LTS kernel. Since the upstream
> LTS kernel already doesn't match the RHEL kernel lifecycle and Red Hat
> engineers would wind up doing a bunch of work anyway for the kABI
> stabilization and live kernel patching features, the non-LTS kernels
> are strategically better because there's less churn in them and more
> long-term flexibility.

This is an interesting point. Oracle UEK R6, as used in OL 7 and OL 8,
is tied to Linux 5.4.17, and I think the above might be a good summary
of why this is a good thing. One part of this is to set the
expectation for what kABI is supported. The other part is to gain
access to newer features, and reduce the cost of maintaining a high
quality back port that still adheres to this kABI. By deploying Oracle
UEK R6 simultaneously to both OL 7 and OL 8, they have effectively
separated the kernel from the OS distro, allowing for both elements to
be achieved. This seems like something that could be useful to do for
the broader EL community. However, from a RHEL/CentOS perspective, it
seems like a new thing, which means vendors may now support Oracle UEK
R6, or the RHEL Kernel, but they wouldn't be aware of such a new thing

> > > More than most, I get why you're upset about the kABI always breaking
> > > as kernel updates push out, but instead of just saying "it's not
> > > suitable", we should be building solutions to *make* it suitable for
> > > the Enterprise. It's *bad* that the RHEL kernel breaks its own
> > > promises so often (which is a relatively new thing, in my experience),
> > > and we should be implementing safeguards to stop it from happening
> > > going forward.
> > Yes. Although, in the mean-time...
> Well, since nothing can really change now, I'm looking forward a bit
> and trying to see how we can take advantage of the situation to better
> the wider community. After all, the biggest virtue of a true open
> source community is the ability to adapt.

I will do something similar. I mostly watch this list (and many
others) the last few years, but this change was something that I
wanted to ensure was represented a little better. Not sure if I did a
good job or not, but - once all the things that can be said are said,
and the dust settles, it's exactly as you say. We will adapt. We'll
take advantage of new opportunities and new capabilities, and we'll
fill any gaps left over according to our needs.

Mark Mielke <mark.mielke at gmail.com>