[CentOS-devel] Updates from today

Fri Mar 11 09:55:39 UTC 2011
Johnny Hughes <johnny at centos.org>

On 03/10/2011 07:55 PM, Nico Kadel-Garcia wrote:
> On Thu, Mar 10, 2011 at 7:18 PM, Johnny Hughes <johnny at centos.org> wrote:
> 
>> Why do you keep talking about a SCM system.  Everything you want to know
>> is in the SRPMS.  If you want to create a git repo of them, have at it.
>>  You like SVN better, use it.  CVS your thing, use that.  Look for
>> .centos files and pull them in (that and the kernel is all we change).
>>
>> I work with SRPMS, not with an SCM system.  I like SRPMS, they are a SCM
>> system of their own.
> 
> Because they're really not. Patches can be altered, and .spec files
> altered, without any logging or notification of the change. Release
> numbers and revision numbers are hard-coded, not trackable.
> 
>> We do not change what upstream has in their SRPMS (except when we have
>> to) ... we don't even unpack them unless we need to change them.  We
>> submit them to mock to build.  Every patch we create, every change we
>> make, it is in the SRPM.
> 
> That's..... a pretty odd approach. Not inconceivable, but *exactly*
> the sort of informaiton not in the "do it yourself, it's easy"
> approach.
> 
>> Why is this so hard to understand?
> 
> Because it's amazingly poor software management. SRPM's are binaries
> and make change tracking quite awkward, and rely entirely on the
> developer to consistently report changes in the %changelog.
> That's..... really awkward.
> 
>> If we were maintaining changes in 2500 SRPMS per distribution (times 3
>> or 4 distributions), we would do it in an SCM program, but since we just
>> BUILD the vast majority of these packages without changes, maintaining
>> an SCM of 10,000 packages when we change less than 1% of them does not
>> make much sense.
> 
> No, no, you'd just SCM the ones  you alter, and the build system
> (which needed design to provide a bootstrappable environment.)
> 
>> You have the SRPMS, you have example config files, you have the mock
>> that we use, you have the script that we build the software tree with,
>> you have the file that we use to compare RPMs with upstream.  Those are
>> what we use.
>>
>>> I've really been hoping for public access to the build structure. "You
>>> can do it yourself" is not as helpful as the kind of public access to
>>> build structures that Dag publishes, and has been suggesting.
>>
>> The build structure is NOT necessarily a public machine.  The machines
>> that get built on do not necessarily belong to CentOS.  My company, for
>> example, provides some resources that I build on.  You can not have
>> access to my company's internal network or their machines.
> 
> Excuse me, I didn't say it should be. But access to the /etc/mock
> files, *in the SCM I just described*, would be helpful.
> 
>> Dag changes SRPMS and source code ... we rebuild someone else's source
>> code.  That is why we don't maintain an SCM.
> 
> But you do change them! By your own admission above, you've altered
> 100 packages. That's plenty to justify an SCM.

Let me ask you another question about this magical SCM system that will
make the world a better place.

Lets say I import the first upstream package into the SCM as package 1.
 Lets do httpd as it needs changes.  I install it onto our system and it
gets split into a SPEC and SOURCES directory.

Now, what items would you want to pull into this magical SCM ... just
the text files or also their tarballs.  Or would you untar/unzip the
tarballs too.  (Remember we NEVER change their tarballs).  Lets leave
the tarballs out for now.  Lets just bring in their spec file and all
the text.  Lets change the spec the way we want it and commit that to
the SCM.  Now lets add in our patches and commit that to the SCM.  So
far this SCM is looking great.  We can use it to see the 2 patches we
added and it will do the diffs for us on spec file.  So far so good.
(You can get exactly the same info if you run the diff -uNrp on the SPEC
directory and the SOURCES directory, but still it makes sense to do this
at this point).

Now, upstream releases a new package ... what do you do?  Lets import it
into our SCM.  Hmmm ... it overwrites the spec file and removes my
changes ... that's OK, I can still see them from commit 1, so I will go
back and grab them from there and reapply them into the new spec file.
But now, that makes the versions complicated and bringing in the new
spec shows me what upstream changed, but it does not show me anything
more about my changes than the first one did.  During the build process,
I need to add another patch and specfile change and make a commit.  Now,
I have 4 changes to the spec file ... which ones are mine and which ones
are upstream?  After about 3 or 4 cycles it is all muddled.  However, if
I grab any version of the upstream package and the corresponding centos
version of the package and diff the SPEC and SOURCES directory of those
it tells me in an instant exactly what is different from upstream.  So,
for the SPECs it is MUCH easier to get the info you want to know (what
did CentOS roll in on package x-y-z.1) by using SRPMS than by using SCM.

OK, lets look at the SOURCES directory.  We brought it in and we
committed, then we made our changes and committed ... everything is
great.  First update, we pull in their changes and now we can see
everything they changed ... but our patches go in as expected, so no
changes from us.  The SCM works great to tell us what they changed, but
we did not change anything.  Lets say I do need to refactor one of our
patches.  I can see the difference it has but after several cycles is it
very difficult to see what I really want to see, and that is how is my
package different from the upstream one.  I could care less that they
modified the where a trademark is now as compared to where it was 3
versions ago ... all I need to know is where it is now so I can get rid
of it.  Remember, we are not making technical changes, JUST removing
trademarks and branding.  What we changed last time does not matter, all
that matters is what we need to change this time.

The bottom line is, I can take any SRPM from upstream, take the same
corresponding SRPM from CentOS, diff them and know what I need to change
to build the next one.  I can then figure out if I need to do anything
to it when I apply the patches, etc.  If I had an SCM, I would need to
take a bunch of time just to figure out which changes I want to look at.
 (I need to look at diff between change 27 and change 38 to see what I
changed in the spec file, and between change 18 and change 35 for the
sources, etc.)  Even worse, I might have made a change to the spec,
committed it, then later on needed to make a second change to the spec.
 So, it becomes much harder to figure out which two changes to look at
to see the things I want to see ... using the SRPMS, it is easy peasy to
see exactly what i want to see.

I have tried to use an SCM for these changes ... for me, it does not
work, it slows me down, and I do not think it adds value in the rebuild
situation.  SCMs are great if you are developing something from point A
to Point B to Point C ... we are NOT doing that.  We makes changes at
Point A ... they roll in perfect at Point B, and Point C ... then at
Point D, they need to be re-factored.  The changes required to refactor
are irrelevant to what was in Point A though Point C and only depend on
Point D.  The SRPMS themselves are BETTER tracking devices than an SCM.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
URL: <http://lists.centos.org/pipermail/centos-devel/attachments/20110311/666a0f79/attachment-0007.sig>