On 12/16/20 9:00 AM, Simon Matter wrote:
That's an interesting point which is, from a technical POV, really not nice. Just imagine an enterprise Linux distribution which was a) self hosting, and b) provides reproducible builds...
I've used Red Hat long enough to remember the old heavily customized Beehive builder. The mach and then mock chroot build systems were vast improvements when they became available, but there are problems even there.
Much depends upon build order. I experienced one instance of build order issues a few years back (2012 or 2013 or thereabouts) as I was rebuilding CentOS 5.6 for IA64 for an SGI Altix 350 system we have here at $dayjob. I remember all the issues on this list related to the concurrent lateness of CentOS 5.6 and 6.0, and I found out a piece of what caused 5.6 to take more time. There was one particular library (I don't remember which one right off, although I could probably go back and look at some point, but not today) that got an upgrade in 5.6; some upgraded packages in 5.6 needed the previous version of that library to build, and some other packages needed the upgraded library.
That library had to be rebuilt at a very specific place in the package build sequence for those other packages to even build correctly, and the buildrequires, if I remember correctly, were not versioned (that, in my opinion, is a packaging error, but at the same time I remember the day when versioned buildrequires didn't work; for that matter, do they work correctly now?). And there are certain core build dependencies that aren't called out in buildrequires anyway.
Now, this particular issue, once I found the package failing to rebuild, due to other dependencies involved I actually had to start the rebuild of the whole set of source packages over from scratch; I thought I was almost done, but in reality I had another full week of rebuilding to go (I was rebuilding on the Altix 350; it took a really long time to build some packages, and I think the full 5.6 build took about a month of machine time); the Altix was designed for compute performance but not I/O performance. This particular build order problem wasted over a week of machine time building packages.
There were other issues in rebuilding 5.6, but that's the one I ran up against and remember, and since I wasn't aiming for a redistributable binary-compatible-to-RHEL IA64 rebuild I didn't aim for the 100% binary compatibility (which does NOT mean identical binaries; it means identical versions of all required libraries and ABIs). The actual CentOS 5.6 build was I'm sure much much more difficult, since the CentOS team do far more testing that what I did; I just barely scratched the surface of how difficult that time in CentOS history really was (6.0, 5.6, and 4.10 all hit close together, if I remember correctly).
When the source distribution changed to git.centos.org that made it more difficult to ferret this build order out. Source RPMs contain the build timestamp; git.centos.org sources do not. But it can get more complicated; there have been instances, I believe, where a released package was built with a buildroot that contained unreleased versions of libraries; the released version of those libraries could be different. From a strict binary compatibility standpoint, any version bump of any required library breaks 100% compatibility. That's the second edge of the two-edged sword called shared libraries.
Yes, an enterprise Linux that is fully self-hosting and with reproducible builds is a laudable goal; is it a feasible goal, from a business point of view? I would say no, myself, since it provides no real return on investment. As the package set gets larger, the amount of work to verify self-hosting becomes that much more difficult, and costly.
BUT, in this one particular regard CentOS Stream will be far superior; you're not getting a big dump of packages or source commits to git.centos.org where the build order is unknown; with Stream you get the build order live, and that solves that particular problem (while creating a different problem, but I've already posted about kernel driver ABI fun). The tradeoff is better transparency at the cost of less binary compatibility (it's not to the level of Fedora, or Rawhide, or being a beta; it's just not what most CentOS users want, free RHEL with all features turned on; but did we ever really have that?). But as far as administration goes, CentOS Stream should be essentially identical to RHEL.