[CentOS-devel] Questions about the centos rebuild process / s390x related

Sat Jul 11 13:23:06 UTC 2009
R P Herrold <herrold at centos.org>

On Sat, 11 Jul 2009, Sascha Thomas Spreitzer wrote:

> i have been asked from some companies,
>
> 1. how long it would take to rebuild a centos from scratch if I would
> have a whole CP (common processor) for it.

On a rather slow subinstance without much ram, it takes a bit 
over 30 hr to walk a single 'pass' through the build process 
and either do the rebuild of a given package, conclude it 
needs a dependency, or otherwise fail.  As I am doing a rather 
naive solution algorithm, I have no doubt that this can be 
improved to reduce the number of passes until no furhter 
solution is attained

I do not have a complete convergence yet, so I do not know the 
number of passes I will use.  I have writen versions of this 
email before -- one to the RPM mailing list in 2001 comes to 
mind; I have published scripts doing variations of what I 
describe here in my ftp site at: 
ftp://ftp.owlriver.com/pub/mirror/ORC/buildfarm but they are 
not current nor maintained.  Looking, the datestamps largely 
predate CentOS, and were for internal purposes or for cAos 
development work

> 2. What the build process includes, meaning "steps to success"

Many have outlined the process, and there is more than one way 
to do it.  One 'bootstraps' from a running distribution, into 
the minimal subset needed to self host a build chroot.  Then 
from that subset, one builds the build chroot again.  Then one 
builds out toward desired leaf nodes, satisfing intermediate 
dependencies

Consider trying an experiment and watch the process which the 
GNU folks use for GCC and GLIBC -- they bootstrap into a self 
building environment, and then build again with the new tools, 
and diff, to make sure the build is deterministic (using diff) 
and capable of self hosting.  _Then_ you build out toward the 
leaf node packages.  Lather, rinse, and repeat.  It is similar 
here

CentOS is fortunate that it rebuilds a known finite package 
set, rather than having to stabilize a packageset into a 
distribution.  RPM and YUM also make it possible to easily 
query and build a map of all Requires/Depends mappings, so on 
can know that one has a package collection ( a 'repository' ) 
which has closure in the sense that all such Requires and 
Depends are satisfied

How a package is built often determines what it will include 
-- the autotools and ./configure process are designed to 
'inventory' what library headers are present and conditionally 
add features to a package -- that is:  was 'tcp_wrappers' 
present, and if so, we see: /usr/include/tcpd.h This means 
that 'wrappers' support _can_ be added by a given package. 
That package might also be willing to build without wrappers 
support. [I chose this example, because occasionally I have 
seen a distribution stabilizer omit this particular package 
from the build environment of a candidate, and one assumes 
inadvertently, cause such support to be omitted]

This presence or omission can be spotted a couple of ways -- 
by reading and anlyzing build logs (which is a mind numbing 
task, and requires a specific awareness of what is 'right'); 
-or- by using 'ldd' and other tools to examine a binary file 
to see what libraries it calls for

Again, CentOS has an easier task, as we can compare the ldd 
results for each binary from a 'real' upstream product, to our 
rebuild effort's candidate.  This mailing list has pointed to 
tools to do just that which the CentOS project have released.

CentOS developers can take an easier route, because we have a 
well defined CentOS goal:

 	Reproduce the upstream binaries, warts and all
 	(without encumbered trademark; and adding our own
 	art trademarks and other copyrighted matter) and
 	attending to the changes needed for the updater

[Recall that CentOS 2.1, 3, and early 4 did not have the 
packages needed for the 'yum' approach the project uses; there 
has been a bit of a stink by an EPEL ignorant of the timing 
and package entry process which 'yum' and 'sqlite' followed 
into RHEL in recent days].  CentOS used yum in part becasue of 
the people involved at the time and the active development it 
was under in the RHL 8 and 9 days; also the sources for the 
server side required by 'up2date' were unavailable, etc

> 3. whether the licenses force a maintainer to push the resulting
> distribution upstream.

So far as I know, CentOS has never been _asked_ to do so and 
is under no obligation to provide its binary product to 
anyone.  That said, clearly many upstream use CentOS' product

> 4. In which cases the maintainer is liable for the resulting
> distribution rpms, binaries, etc...

You will have to consult comptent counsel for your 
jurisdiction, as matters of liability are out of scope as to 
matters upon which I will opine here

> It would be VERY helpful if I can clarify those questions. It might
> spend us 1 CP.

I have requested access to such 'real hardware' builders here, 
and in the Marist s390 list, and by private communication to 
potential donors, and been greeted with a deafening silence as 
to offers.  As I noted in my earlier post, IBM came through 
for me, and I restarted another pass before composing this 
reply.

With my best regards,

-- Russ herrold