[CentOS-devel] Balancing the needs around the CentOS platform

2. Minor release milestones to stabilize branches. We have breakage · Sat Dec 19 10:09:24 UTC 2020

On Sat, Dec 19, 2020 at 4:34 AM Mark Mielke <mark.mielke at gmail.com> wrote:
> 2. Minor release milestones to stabilize branches. We have breakage
> with most minor release upgrades, and the stabilization process is an
> important method of isolating users from being affected by this. This
> is why CentOS 8 Stream is being said "for developers", while RHEL 8
> would be "for production". It is being said, because it is a real
> thing. If you truly believed minor release milestones were unnecessary
> for CentOS 8 Stream, then you would also believe that minor release
> milestones were unnecessary for RHEL 8.

I should provide real-life examples for you to consider. I will just a
pick few that come to mind without thinking too hard.

The first example, is EL 7.8 and the GlusterFS update. This broke our
loadbuild process across multiple product teams. In this particular
case, several teams build a custom version of Qemu as part of their
Yocto build process. The error showed up like this:

build   01-Dec-2020 08:05:20   |
.../tmp/work/x86_64-linux/qemu-native/2.7.0-r1/qemu-2.7.0/block/gluster.c:
In function ‘qemu_gluster_truncate’:
build   01-Dec-2020 08:05:20   |
.../tmp/work/x86_64-linux/qemu-native/2.7.0-r1/qemu-2.7.0/block/gluster.c:1000:5:
error: too few arguments to function ‘glfs_ftruncate’
build   01-Dec-2020 08:05:20   |      ret = glfs_ftruncate(s->fd, offset);
build   01-Dec-2020 08:05:20   |      ^
build   01-Dec-2020 08:05:20   | In file incl

GlusterFS changed the function glfs_ftruncate to require a second
argument. Red Hat chose to upgrade GlusterFS as part of EL 7.8. Qemu
2.7.0 doesn't know about this change. Builds failed for three of our
product teams.

By design, we did early adopter testing of 7.8 before mass upgrading
everyone, and we discovered this problem, and were able to get code
changes into the various product releases. In this case, we chose to
disable GlusterFS from the Qemu build process as GlusterFS was an
accidental dependency. Once it was disabled, the above code stopped
being compiled, and the emergency was averted.

Now, imagine the same scenario with "CentOS 8 Stream". Are you
expecting that users will perform each set up their own milestone
release processes? Are we going to test every single package that gets
released to "CentOS 8 Stream" incrementally to ensure that our systems
don't experience massive breakage?

Another example that hit us with EL 7.7, is an update to elfutils that
was incompatible with binutils and gcc:

https://wiki.gentoo.org/wiki/Binutils_2.32_upgrade_notes/elfutils_0.175:_unable_to_initialize_decompress_status_for_section_.debug_info

In this case, we still don't have a solution - but are downgrading
elfutils for 7.7+ until we do decide what to do about it. Still, we
were able to detect it in our 7.7 early adopter testing, and decide on
a temporary solution *before* rolling it out to the users.

I mentioned this problem earlier this week:

https://bugzilla.redhat.com/show_bug.cgi?id=1489542

This was a change to autofs in EL 7.4 that caused it to drop automount
maps every 10 minutes in our environment, causing actual breakage when
paths were accessing during these intervals. Again, we were able to
detect this in our EL 7.4 early adopter testing, and defer roll out of
EL 7.4 until after we came up with a solution.

These are just a few of the regular things that we encounter all of
the time. I don't see how CentOS 8 Stream can address these, unless
you promise to treat CentOS 8 Stream as a minor release, and never
introduce changes in API, new features, or changes in behaviour.

For all of these, when upstream or downstream - we contribute back
analysis, bug reports, and fixes. However, I cannot be deploying
CentOS 8 Stream in our environment. It wouldn't even be an option. We
would choose something else.

I think when you say "95% of users would be ok with CentOS 8 Stream",
you are talking about basic usage. You are not talking about
Enterprise use cases.

--
Mark Mielke <mark.mielke at gmail.com>