On 03/10/2011 07:55 PM, Nico Kadel-Garcia wrote:
On Thu, Mar 10, 2011 at 7:18 PM, Johnny Hughes johnny@centos.org wrote:
Why do you keep talking about a SCM system. Everything you want to know is in the SRPMS. If you want to create a git repo of them, have at it. You like SVN better, use it. CVS your thing, use that. Look for .centos files and pull them in (that and the kernel is all we change).
I work with SRPMS, not with an SCM system. I like SRPMS, they are a SCM system of their own.
Because they're really not. Patches can be altered, and .spec files altered, without any logging or notification of the change. Release numbers and revision numbers are hard-coded, not trackable.
We do not change what upstream has in their SRPMS (except when we have to) ... we don't even unpack them unless we need to change them. We submit them to mock to build. Every patch we create, every change we make, it is in the SRPM.
That's..... a pretty odd approach. Not inconceivable, but *exactly* the sort of informaiton not in the "do it yourself, it's easy" approach.
Why is this so hard to understand?
Because it's amazingly poor software management. SRPM's are binaries and make change tracking quite awkward, and rely entirely on the developer to consistently report changes in the %changelog. That's..... really awkward.
If we were maintaining changes in 2500 SRPMS per distribution (times 3 or 4 distributions), we would do it in an SCM program, but since we just BUILD the vast majority of these packages without changes, maintaining an SCM of 10,000 packages when we change less than 1% of them does not make much sense.
No, no, you'd just SCM the ones you alter, and the build system (which needed design to provide a bootstrappable environment.)
You have the SRPMS, you have example config files, you have the mock that we use, you have the script that we build the software tree with, you have the file that we use to compare RPMs with upstream. Those are what we use.
I've really been hoping for public access to the build structure. "You can do it yourself" is not as helpful as the kind of public access to build structures that Dag publishes, and has been suggesting.
The build structure is NOT necessarily a public machine. The machines that get built on do not necessarily belong to CentOS. My company, for example, provides some resources that I build on. You can not have access to my company's internal network or their machines.
Excuse me, I didn't say it should be. But access to the /etc/mock files, *in the SCM I just described*, would be helpful.
Dag changes SRPMS and source code ... we rebuild someone else's source code. That is why we don't maintain an SCM.
But you do change them! By your own admission above, you've altered 100 packages. That's plenty to justify an SCM.
Let me ask you another question about this magical SCM system that will make the world a better place.
Lets say I import the first upstream package into the SCM as package 1. Lets do httpd as it needs changes. I install it onto our system and it gets split into a SPEC and SOURCES directory.
Now, what items would you want to pull into this magical SCM ... just the text files or also their tarballs. Or would you untar/unzip the tarballs too. (Remember we NEVER change their tarballs). Lets leave the tarballs out for now. Lets just bring in their spec file and all the text. Lets change the spec the way we want it and commit that to the SCM. Now lets add in our patches and commit that to the SCM. So far this SCM is looking great. We can use it to see the 2 patches we added and it will do the diffs for us on spec file. So far so good. (You can get exactly the same info if you run the diff -uNrp on the SPEC directory and the SOURCES directory, but still it makes sense to do this at this point).
Now, upstream releases a new package ... what do you do? Lets import it into our SCM. Hmmm ... it overwrites the spec file and removes my changes ... that's OK, I can still see them from commit 1, so I will go back and grab them from there and reapply them into the new spec file. But now, that makes the versions complicated and bringing in the new spec shows me what upstream changed, but it does not show me anything more about my changes than the first one did. During the build process, I need to add another patch and specfile change and make a commit. Now, I have 4 changes to the spec file ... which ones are mine and which ones are upstream? After about 3 or 4 cycles it is all muddled. However, if I grab any version of the upstream package and the corresponding centos version of the package and diff the SPEC and SOURCES directory of those it tells me in an instant exactly what is different from upstream. So, for the SPECs it is MUCH easier to get the info you want to know (what did CentOS roll in on package x-y-z.1) by using SRPMS than by using SCM.
OK, lets look at the SOURCES directory. We brought it in and we committed, then we made our changes and committed ... everything is great. First update, we pull in their changes and now we can see everything they changed ... but our patches go in as expected, so no changes from us. The SCM works great to tell us what they changed, but we did not change anything. Lets say I do need to refactor one of our patches. I can see the difference it has but after several cycles is it very difficult to see what I really want to see, and that is how is my package different from the upstream one. I could care less that they modified the where a trademark is now as compared to where it was 3 versions ago ... all I need to know is where it is now so I can get rid of it. Remember, we are not making technical changes, JUST removing trademarks and branding. What we changed last time does not matter, all that matters is what we need to change this time.
The bottom line is, I can take any SRPM from upstream, take the same corresponding SRPM from CentOS, diff them and know what I need to change to build the next one. I can then figure out if I need to do anything to it when I apply the patches, etc. If I had an SCM, I would need to take a bunch of time just to figure out which changes I want to look at. (I need to look at diff between change 27 and change 38 to see what I changed in the spec file, and between change 18 and change 35 for the sources, etc.) Even worse, I might have made a change to the spec, committed it, then later on needed to make a second change to the spec. So, it becomes much harder to figure out which two changes to look at to see the things I want to see ... using the SRPMS, it is easy peasy to see exactly what i want to see.
I have tried to use an SCM for these changes ... for me, it does not work, it slows me down, and I do not think it adds value in the rebuild situation. SCMs are great if you are developing something from point A to Point B to Point C ... we are NOT doing that. We makes changes at Point A ... they roll in perfect at Point B, and Point C ... then at Point D, they need to be re-factored. The changes required to refactor are irrelevant to what was in Point A though Point C and only depend on Point D. The SRPMS themselves are BETTER tracking devices than an SCM.