Now that we have a working CentOS 7 tree and process for putting Red Hat Sources onto git.centos.org, we are going to start a process to figure out how to import CentOS-6 Sources onto git.centos.org as well.
We will be starting with sources from 6.0 ISOs all the way through current CentOS-6+updates.
We do not have a specific time line for this to happen, but we will be generating the SRPM lists (in order) that we will likely use to populate original RHEL SRPMs (for non modified packages) and the CentOS SRPMs (for modified packages), and the correct order to make it happen.
We will be posting those lists for review/correction to both the CentOS QA IRC channel and this mailing list.
We will also be upgrading to a new version of gitblit for git.centos.org before we actually roll in the CentOS-6 sources, though I will begin the list generation before we actually start the gitblit upgrade process.
Thanks, Johnny Hughes
On 08/19/2014 10:50 AM, Johnny Hughes wrote:
Now that we have a working CentOS 7 tree and process for putting Red Hat Sources onto git.centos.org, we are going to start a process to figure out how to import CentOS-6 Sources onto git.centos.org as well.
We will be starting with sources from 6.0 ISOs all the way through current CentOS-6+updates.
We do not have a specific time line for this to happen, but we will be generating the SRPM lists (in order) that we will likely use to populate original RHEL SRPMs (for non modified packages) and the CentOS SRPMs (for modified packages), and the correct order to make it happen.
We will be posting those lists for review/correction to both the CentOS QA IRC channel and this mailing list.
We will also be upgrading to a new version of gitblit for git.centos.org before we actually roll in the CentOS-6 sources, though I will begin the list generation before we actually start the gitblit upgrade process.
Thanks, Johnny Hughes
CentOS-devel mailing list CentOS-devel@centos.org http://lists.centos.org/mailman/listinfo/centos-devel
Will there be an rsync target available for the git repos and look-aside sources?
https://bugs.centos.org/view.php?id=7185
Pat
On 08/19/2014 04:54 PM, Pat Riehecky wrote:
We will also be upgrading to a new version of gitblit for git.centos.org before we actually roll in the CentOS-6 sources, though I will begin the list generation before we actually start the gitblit upgrade process.
Will there be an rsync target available for the git repos and look-aside sources?
I'd like to be able to offer rsync from there as well, but there are a few challenges that need resolved first. For the binary content cache, we can likely run the rsync instance from the backup machine so there is no network load on the production box. For the git repos its a bit harder since there are private or working-in-progress repos in there as well, and we need to find a way to mask those out.
Certainly worth trying to get to.
On Wed, Aug 20, 2014 at 5:24 PM, Karanbir Singh mail-lists@karan.org wrote:
On 08/19/2014 04:54 PM, Pat Riehecky wrote:
We will also be upgrading to a new version of gitblit for git.centos.org before we actually roll in the CentOS-6 sources, though I will begin the list generation before we actually start the gitblit upgrade process.
Will there be an rsync target available for the git repos and look-aside sources?
I'd like to be able to offer rsync from there as well, but there are a few challenges that need resolved first. For the binary content cache, we can likely run the rsync instance from the backup machine so there is no network load on the production box. For the git repos its a bit harder since there are private or working-in-progress repos in there as well, and we need to find a way to mask those out.
Certainly worth trying to get to.
Use GPG signed git tags to assure provenance, and the repository can be safely cloned. Rsyncing a git repo is like rsyncing a CVS or Subversion reository. Even small changes in the midst of the rsync operation can corrupt the underlying database.
I did *suggest* using GPG signed git tags instead of the "parse the git log to figure out the revision matching the built RPM", and one of the reasons was to assure provenance for cloned repositories. All the necessities to assure provenance apply to rsynced mirrors as well. Unless *every single mirror* is as robustly deployed as git.centos.org, they are at risk of local manipulation installing a trojan or otherwise violating their security.
On 08/21/2014 02:29 AM, Nico Kadel-Garcia wrote:
Use GPG signed git tags to assure provenance, and the repository can be safely cloned. Rsyncing a git repo is like rsyncing a CVS or Subversion reository. Even small changes in the midst of the rsync operation can corrupt the underlying database.
the objects are checsum'd - the whole underlaying fabric of git is based on hash's, a corruption in content would be fairly easy to notice.
On Thu, Aug 21, 2014 at 3:15 AM, Karanbir Singh mail-lists@karan.org wrote:
On 08/21/2014 02:29 AM, Nico Kadel-Garcia wrote:
Use GPG signed git tags to assure provenance, and the repository can be safely cloned. Rsyncing a git repo is like rsyncing a CVS or Subversion reository. Even small changes in the midst of the rsync operation can corrupt the underlying database.
the objects are checsum'd - the whole underlaying fabric of git is based on hash's, a corruption in content would be fairly easy to notice.
So what? It's a broken mirror, and in the midst of writes to individual repo at git.centos.org, each of *those* will be a broken mirror. Were you planning on setting up some kind of staging from the main repositoty to local rsync targets? That moves the problem upstream. and you'd need to use something like git clones and git pull to keep those safely up to date. But then we're back to being sure of the provenance of *those* repositories, and others will still be at risk of corrupting *those* when that target gets updated. Really, rsync based or filesystem based snapshots for anything with an underlying database all present the same kind of risks.
I assume you were planning on running 6000 distinct rsync mirror targets, one for each git repository, so I assume the damage would be isolated to only those repositories in the midst of update. How often are they going to be broken? While individually relatively stable repositories are likely to be intact, repositories that have a lot of churn are most at risk.
And checksums don't solve the provenance problem. Someone who maliciously p0wns a mirror site can trojan the site, and without something like GPG signed git tags, they content becomes very difficult to verify. it's theoretically possible to sit local working git clones to talk to several, distinct upstream, remote repositories and verify contents against them, but there will be frequent distinctions between git.centos.org and the rsync mirrors.
On 20 August 2014 19:29, Nico Kadel-Garcia nkadel@gmail.com wrote:
On Wed, Aug 20, 2014 at 5:24 PM, Karanbir Singh mail-lists@karan.org wrote:
On 08/19/2014 04:54 PM, Pat Riehecky wrote:
We will also be upgrading to a new version of gitblit for
git.centos.org
before we actually roll in the CentOS-6 sources, though I will begin
the
list generation before we actually start the gitblit upgrade process.
Will there be an rsync target available for the git repos and look-aside sources?
I'd like to be able to offer rsync from there as well, but there are a few challenges that need resolved first. For the binary content cache, we can likely run the rsync instance from the backup machine so there is no network load on the production box. For the git repos its a bit harder since there are private or working-in-progress repos in there as well, and we need to find a way to mask those out.
Certainly worth trying to get to.
Use GPG signed git tags to assure provenance, and the repository can be safely cloned. Rsyncing a git repo is like rsyncing a CVS or Subversion reository. Even small changes in the midst of the rsync operation can corrupt the underlying database.
Could you please cut down the broken record please? There is no sign there will be GPG signed git tags in the near future and your constant harping on it is not going to make it happen any faster.
On Thu, Aug 21, 2014 at 1:54 PM, Stephen John Smoogen smooge@gmail.com wrote:
On 20 August 2014 19:29, Nico Kadel-Garcia nkadel@gmail.com wrote:
On Wed, Aug 20, 2014 at 5:24 PM, Karanbir Singh mail-lists@karan.org
I'd like to be able to offer rsync from there as well, but there are a few challenges that need resolved first. For the binary content cache, we can likely run the rsync instance from the backup machine so there is no network load on the production box. For the git repos its a bit harder since there are private or working-in-progress repos in there as well, and we need to find a way to mask those out.
Certainly worth trying to get to.
Use GPG signed git tags to assure provenance, and the repository can be safely cloned. Rsyncing a git repo is like rsyncing a CVS or Subversion reository. Even small changes in the midst of the rsync operation can corrupt the underlying database.
Could you please cut down the broken record please? There is no sign there will be GPG signed git tags in the near future and your constant harping on it is not going to make it happen any faster.
Sorry it bothers you. I didn't bring up using rsync to make mirrors. I'm trying to get across the provenance and propagation problem.
Besides the potential corrupt snapshot problem, there's the inevitable discrepancies between the mirrors and git.centos.org itself. Content is likely to differ in small ways among the mirrors, due to the rsync based snapshot being in the past. I assume that some of the individual repos are changing during the overall rsumc update period, unless they're all done in parallel, which would be *really* nasty.
Unless.... Is there a top level directory to use for an rsync mirror? That's going to be a pretty bulky rsync operation, with over 6000 subdirectories and the amount of churn in any of the modified git repos.
Anyway, verification of the consistency of all the mirrored repositories becomes awkward. There's also the lack of site verification in the unencrypted and unsigned rsync protocol, which I'd not even thought about for git.centos.org. That puts it right into the "people cloning from each other's unsecured repos locally" world, in this case cloning from the rsync mirrors. And it directly brings up the "verify the provenance of local repos" problem that was discounted by some when I brought up the problem earlier.
Several folks did bring up the point of "git.centos.org has an SSL key, what's not secure about it?" If we're using rsync mirrors, we're relying on someone else's mirror site to be secure, as well. And we're probably relying on unencrypted rsync to git.centos.org, itself, to support those mirrors. And we're once again open to someone polluting the data stream with a fake repo.
No, *if* our friends at git.centos.org want to help protect that data stream for mirror clients, they can consider using something like rsync with ssh keys and the old "validate-rsync.sh" script as a ForceCommand. A site that wants to be a mirror would need a relevant private SSH key, and unlocking it for rsync use is their problem. But that would at least help assure provenance between the mirror sites and git.centos.org.
The repercussions of using rsync for this start adding up pretty fast.
On 08/22/2014 03:13 AM, Nico Kadel-Garcia wrote:
Besides the potential corrupt snapshot problem, there's the inevitable discrepancies between the mirrors and git.centos.org itself. Content is likely to differ in small ways among the mirrors, due to the rsync based snapshot being in the past. I assume that some of the individual repos are changing during the overall rsumc update period, unless they're all done in parallel, which would be *really* nasty.
it does not matter... the content on the binary cache is hash'd, have you looked at how things are setup ?
Unless.... Is there a top level directory to use for an rsync mirror? That's going to be a pretty bulky rsync operation, with over 6000 subdirectories and the amount of churn in any of the modified git repos.
I dont understand that statement, are you questioning rsync's ability to handle 6k dirs ?
Anyway, verification of the consistency of all the mirrored repositories becomes awkward. There's also the lack of site
to be clear, I dont think the aim here is to setup content mirrors for general consumption, the aim is to have a rsync target that lets people run their own mirrors. And we dont need any real sync between git and binary sources - since they are tracked in git as hash'd objects. Something missing will get flagged up right away ( or corrupt )
I realise we have an issue where some of the hash's are sha1's and others are sha256's and the checking code, client side, needs to check lenght and use the right algo - but thats something which should get fixed as we all end up using the same tools and convention.
verification in the unencrypted and unsigned rsync protocol, which I'd not even thought about for git.centos.org. That puts it right into the "people cloning from each other's unsecured repos locally" world, in this case cloning from the rsync mirrors. And it directly brings up the "verify the provenance of local repos" problem that was discounted by some when I brought up the problem earlier.
Several folks did bring up the point of "git.centos.org has an SSL key, what's not secure about it?" If we're using rsync mirrors, we're relying on someone else's mirror site to be secure, as well. And we're probably relying on unencrypted rsync to git.centos.org, itself, to support those mirrors. And we're once again open to someone polluting the data stream with a fake repo.
right, so the confusion comes from other-mirrors, thats certainly not the aim here. its all for local consumption. And I dont know what is involved in getting rsync around a ssl wrapper. But the fact that metadata in the git repos' has the corrosponding hash's should be good enough for validating per file. Doing this for the entire tree, every possible piece would be quite hard, admittedly.
n Fri, Aug 22, 2014 at 3:55 AM, Karanbir Singh mail-lists@karan.org wrote:
On 08/22/2014 03:13 AM, Nico Kadel-Garcia wrote:
Besides the potential corrupt snapshot problem, there's the inevitable discrepancies between the mirrors and git.centos.org itself. Content is likely to differ in small ways among the mirrors, due to the rsync based snapshot being in the past. I assume that some of the individual repos are changing during the overall rsumc update period, unless they're all done in parallel, which would be *really* nasty.
it does not matter... the content on the binary cache is hash'd, have you looked at how things are setup ?
Forgive me, please, if I wander among different expertise levels and seem to be teaching my granny to suck eggs. It's hard to aim analyses at people with different levels of expertise and experience, and it can be hard to balance completeness versus clarity.
In this case: Yes, I've looked, and transient inconsistencies break the mirrored repository. "git clone" operations against the broken repository are reported, fail, and most clients will be quite out of luck. It's like when you rsync an RPM repository before the repodata has been updated, it can get messy and broken.
And I assume you're not doing "git gc" on the upstream repositories, or not doing it often. Do git pushes ever trigger a repacking on the upstream repository? It's an interesting question, and is another factor that could trigger broken mirrors.
Unless.... Is there a top level directory to use for an rsync mirror? That's going to be a pretty bulky rsync operation, with over 6000 subdirectories and the amount of churn in any of the modified git repos.
I dont understand that statement, are you questioning rsync's ability to handle 6k dirs ?
Sorry if I was unclear. As git.centos.org is configured, each git repository is distinct and unique. We have no visibility to its layout out here in userland. into its back end fileystem. So if it's set up as "/mountpoint/gitrepos/k/kernel", "/mountpoint/gitrepos/s/sendmail", cool. You can set up one rsync daemon sharing "/mountpoint"gitrepos". If they're scattered all over your filesystems and you're publishing each of them as a different rsync target, that makes configuring an rsync daemon and relevent rsync targets quite awkward.
Yes, it may sound like I'm teaching my granny to suck eggs. Not everyone is as expert with setting up rsync daemons as some of us, so please forgive me for perhaps getting into too much detail.
Anyway so far, so good. The intriguing problems happen when mirror sites have to traverse that in a single operation, and potentially commit the entire environment with a '---delay-updates'. RPM based mirrors have the advantage that the number of files being changed is usually quite small, maybe a few dozen RPM's an busy day and the repodata transactions. Large operations are usually tied to specific directories, such as when CentOS 7 was first published.
Git repos.... are going to be more intriguing to merge and parse if and when CentOS 6 source material is also merged into the primary repos.
Anyway, verification of the consistency of all the mirrored repositories becomes awkward. There's also the lack of site
to be clear, I dont think the aim here is to setup content mirrors for general consumption, the aim is to have a rsync target that lets people run their own mirrors. And we dont need any real sync between git and binary sources - since they are tracked in git as hash'd objects. Something missing will get flagged up right away ( or corrupt )
I'm trying to suggest that the mirrors would be safer, and more robust, and have better provenance for their content, if you'd publish signed GPG tags in the repos and support git clones, rather than rsync mirrors, for offsite mirrors.
I realise we have an issue where some of the hash's are sha1's and others are sha256's and the checking code, client side, needs to check lenght and use the right algo - but thats something which should get fixed as we all end up using the same tools and convention.
Does this cause a problem? I thought the clients were quite robust about it when they make their local "git clone" of the upstream repository.
verification in the unencrypted and unsigned rsync protocol, which I'd not even thought about for git.centos.org. That puts it right into the "people cloning from each other's unsecured repos locally" world, in this case cloning from the rsync mirrors. And it directly brings up the "verify the provenance of local repos" problem that was discounted by some when I brought up the problem earlier.
Several folks did bring up the point of "git.centos.org has an SSL key, what's not secure about it?" If we're using rsync mirrors, we're relying on someone else's mirror site to be secure, as well. And we're probably relying on unencrypted rsync to git.centos.org, itself, to support those mirrors. And we're once again open to someone polluting the data stream with a fake repo.
right, so the confusion comes from other-mirrors, thats certainly not the aim here. its all for local consumption. And I dont know what is involved in getting rsync around a ssl wrapper. But the fact that metadata in the git repos' has the corresponding hash's should be good enough for validating per file. Doing this for the entire tree, every possible piece would be quite hard, admittedly.
Not when the metadata is poisoned by a trojaned merge. Git logs can be edited. Without the GPG sums, it's like a web mirror that has a pack of RPM's with a pack of checksums alongside them. The owner of the mirror, or a cracker attacking the host, can corrupt *both*, and without the GPG tag, it's hard to get provenance.
And *that* is one of the points where having a GPG signed tag, especially one tied to the contents of the SRPM builds, becomes a a useful tool for verifying provenance of the tree. You can't rely on a binary comparison, there's likely to be frequent skew between the rsync mirrors and the main repo as a matter of course.
On 08/25/2014 10:59 PM, Nico Kadel-Garcia wrote:
n Fri, Aug 22, 2014 at 3:55 AM, Karanbir Singh mail-lists@karan.org wrote:
On 08/22/2014 03:13 AM, Nico Kadel-Garcia wrote:
Besides the potential corrupt snapshot problem, there's the inevitable discrepancies between the mirrors and git.centos.org itself. Content is likely to differ in small ways among the mirrors, due to the rsync based snapshot being in the past. I assume that some of the individual repos are changing during the overall rsumc update period, unless they're all done in parallel, which would be *really* nasty.
it does not matter... the content on the binary cache is hash'd, have you looked at how things are setup ?
Forgive me, please, if I wander among different expertise levels and seem to be teaching my granny to suck eggs. It's hard to aim analyses at people with different levels of expertise and experience, and it can be hard to balance completeness versus clarity.
In this case: Yes, I've looked, and transient inconsistencies break the mirrored repository. "git clone" operations against the broken repository are reported, fail, and most clients will be quite out of luck. It's like when you rsync an RPM repository before the repodata has been updated, it can get messy and broken.
And I assume you're not doing "git gc" on the upstream repositories, or not doing it often. Do git pushes ever trigger a repacking on the upstream repository? It's an interesting question, and is another factor that could trigger broken mirrors.
Unless.... Is there a top level directory to use for an rsync mirror? That's going to be a pretty bulky rsync operation, with over 6000 subdirectories and the amount of churn in any of the modified git repos.
I dont understand that statement, are you questioning rsync's ability to handle 6k dirs ?
Sorry if I was unclear. As git.centos.org is configured, each git repository is distinct and unique. We have no visibility to its layout out here in userland. into its back end fileystem. So if it's set up as "/mountpoint/gitrepos/k/kernel", "/mountpoint/gitrepos/s/sendmail", cool. You can set up one rsync daemon sharing "/mountpoint"gitrepos". If they're scattered all over your filesystems and you're publishing each of them as a different rsync target, that makes configuring an rsync daemon and relevent rsync targets quite awkward.
Yes, it may sound like I'm teaching my granny to suck eggs. Not everyone is as expert with setting up rsync daemons as some of us, so please forgive me for perhaps getting into too much detail.
Anyway so far, so good. The intriguing problems happen when mirror sites have to traverse that in a single operation, and potentially commit the entire environment with a '---delay-updates'. RPM based mirrors have the advantage that the number of files being changed is usually quite small, maybe a few dozen RPM's an busy day and the repodata transactions. Large operations are usually tied to specific directories, such as when CentOS 7 was first published.
Git repos.... are going to be more intriguing to merge and parse if and when CentOS 6 source material is also merged into the primary repos.
Anyway, verification of the consistency of all the mirrored repositories becomes awkward. There's also the lack of site
to be clear, I dont think the aim here is to setup content mirrors for general consumption, the aim is to have a rsync target that lets people run their own mirrors. And we dont need any real sync between git and binary sources - since they are tracked in git as hash'd objects. Something missing will get flagged up right away ( or corrupt )
I'm trying to suggest that the mirrors would be safer, and more robust, and have better provenance for their content, if you'd publish signed GPG tags in the repos and support git clones, rather than rsync mirrors, for offsite mirrors.
I realise we have an issue where some of the hash's are sha1's and others are sha256's and the checking code, client side, needs to check lenght and use the right algo - but thats something which should get fixed as we all end up using the same tools and convention.
Does this cause a problem? I thought the clients were quite robust about it when they make their local "git clone" of the upstream repository.
verification in the unencrypted and unsigned rsync protocol, which I'd not even thought about for git.centos.org. That puts it right into the "people cloning from each other's unsecured repos locally" world, in this case cloning from the rsync mirrors. And it directly brings up the "verify the provenance of local repos" problem that was discounted by some when I brought up the problem earlier.
Several folks did bring up the point of "git.centos.org has an SSL key, what's not secure about it?" If we're using rsync mirrors, we're relying on someone else's mirror site to be secure, as well. And we're probably relying on unencrypted rsync to git.centos.org, itself, to support those mirrors. And we're once again open to someone polluting the data stream with a fake repo.
right, so the confusion comes from other-mirrors, thats certainly not the aim here. its all for local consumption. And I dont know what is involved in getting rsync around a ssl wrapper. But the fact that metadata in the git repos' has the corresponding hash's should be good enough for validating per file. Doing this for the entire tree, every possible piece would be quite hard, admittedly.
Not when the metadata is poisoned by a trojaned merge. Git logs can be edited. Without the GPG sums, it's like a web mirror that has a pack of RPM's with a pack of checksums alongside them. The owner of the mirror, or a cracker attacking the host, can corrupt *both*, and without the GPG tag, it's hard to get provenance.
And *that* is one of the points where having a GPG signed tag, especially one tied to the contents of the SRPM builds, becomes a a useful tool for verifying provenance of the tree. You can't rely on a binary comparison, there's likely to be frequent skew between the rsync mirrors and the main repo as a matter of course.
Red Hat does not want to provide us a gpg signed tag, so therefore we will not be getting one. No reason to keep bringing it up. Its not happening ant time soon.
We are not providing mirrors of this all over the place, we are quite happy with one location and backups/failover. What we are trying to provide is the ability for people who want a local mirror of this to be able to get it another way. This is a convenience only, not something that is required.
I am producing CentOS-7 directly with the git repo as it is RIGHT NOW, using absolutely nothing but the tools also provided in this repo and calls to mock. Fermi Scientific Linux is also producing their SL7 from this same git.centos.org repo, so this it is not a blocker to be able to mirror this to get the source code or produce binaries. All the tools are being provided or updated by the community and everything is open. It all works right now.
So, if we can create a mechanism to mirror the content as well .. other than just a script to do it via the json API, then we will. This is not critical, obviously, as both CentOS and Scientific Linux are tracking EL7 and doing updates from git.centos.org just fine right now.
On Wed, Aug 27, 2014 at 7:28 AM, Johnny Hughes johnny@centos.org wrote:
Not when the metadata is poisoned by a trojaned merge. Git logs can be edited. Without the GPG sums, it's like a web mirror that has a pack of RPM's with a pack of checksums alongside them. The owner of the mirror, or a cracker attacking the host, can corrupt *both*, and without the GPG tag, it's hard to get provenance.
And *that* is one of the points where having a GPG signed tag, especially one tied to the contents of the SRPM builds, becomes a a useful tool for verifying provenance of the tree. You can't rely on a binary comparison, there's likely to be frequent skew between the rsync mirrors and the main repo as a matter of course.
Red Hat does not want to provide us a gpg signed tag, so therefore we will not be getting one. No reason to keep bringing it up. Its not happening ant time soon.
I'm confused by this. What does Red Hat, at least the core business, have to do with this? You have a GPG key you use for making RPM's and SRPM's, why shouldn't or couldn't you use the same key to create git tags? This would be for tags for *your* work, and possibly for when you import Red Hat source.
If you have to get your GPG keys from Red Hat, well, that belies the claims I've seen here that git.centos.org has no special relationship with Red Hat, that your git repo is "not special" and you have no special source code access, doesn't it? I'm sorry, but you can't have it both ways. I'm not suggesting you'd need a special Red Hat GPG tag for your imports, but rather have a tag that *you* at CentOS can apply to your repos and relevant tags.
We are not providing mirrors of this all over the place, we are quite happy with one location and backups/failover. What we are trying to provide is the ability for people who want a local mirror of this to be able to get it another way. This is a convenience only, not something that is required.
Right. And they're the ones I'm worried about provenance for. My working assumption is that they will not be as secure as, say, git.centos.org. And it's not trivial to do a side by side comparision, because the sites will have a lot of churn.
I am producing CentOS-7 directly with the git repo as it is RIGHT NOW, using absolutely nothing but the tools also provided in this repo and calls to mock. Fermi Scientific Linux is also producing their SL7 from this same git.centos.org repo, so this it is not a blocker to be able to mirror this to get the source code or produce binaries. All the tools are being provided or updated by the community and everything is open. It all works right now.
That's nice, but mostly irrelevant. It's all that's *available*, so they have to use it. The "analyze git logs to determine relevant revisions" has already broken down at least once that I saw reported here. The awkwardness of the "git log" analysis I've already gone over. If anyone else is going to access the contents form an rsync mirror, provenance becomes even more important.
So, if we can create a mechanism to mirror the content as well .. other than just a script to do it via the json API, then we will. This is not critical, obviously, as both CentOS and Scientific Linux are tracking EL7 and doing updates from git.centos.org just fine right now.
For CentOS, it's by choice and because you've alrady built it into your workflow. For Scientific Linux, it's because that's all that's available. Do you really think it's by *choice*, or by technical preference? I've seen no evidence of that.
On 08/27/2014 09:32 PM, Nico Kadel-Garcia wrote:
On Wed, Aug 27, 2014 at 7:28 AM, Johnny Hughes johnny@centos.org wrote:
Not when the metadata is poisoned by a trojaned merge. Git logs can be edited. Without the GPG sums, it's like a web mirror that has a pack of RPM's with a pack of checksums alongside them. The owner of the mirror, or a cracker attacking the host, can corrupt *both*, and without the GPG tag, it's hard to get provenance.
And *that* is one of the points where having a GPG signed tag, especially one tied to the contents of the SRPM builds, becomes a a useful tool for verifying provenance of the tree. You can't rely on a binary comparison, there's likely to be frequent skew between the rsync mirrors and the main repo as a matter of course.
Red Hat does not want to provide us a gpg signed tag, so therefore we will not be getting one. No reason to keep bringing it up. Its not happening ant time soon.
I'm confused by this. What does Red Hat, at least the core business, have to do with this? You have a GPG key you use for making RPM's and SRPM's, why shouldn't or couldn't you use the same key to create git tags? This would be for tags for *your* work, and possibly for when you import Red Hat source.
We don't IMPORT the Red Hat source code ... Red Hat Engineering provides the Red Hat source code to the machine where git.centos.org lives. (they throw it over the wall that exists between the Red Hat Engineering team and the CentOS team).
Things that come in with "CentOS Sources" user (or earlier the "CentOS Buildsys" user) are not done by the CentOS team, they come from upstream. There is a specific user who is allowed to connect from a specific IP that has a specific key who can import code directly. I can not do it.
When these things come in, I see them the same way that the Scientific Linux team or anyone else who uses this source sees them, by checking the site. I then use the same tools that anyone else who wants to build the source code would use, the tools here:
https://git.centos.org/summary/centos-git-common.git
If they (upstream) gave us the SRPMs directly and we imported them, then we might have some say how they came in ... they do not and therefore we do not. Everyone who gets community source code from Red Hat gets it from git.centos.org .. INCLUDING the CentOS Team.
<snip>
On 08/28/2014 06:09 AM, Johnny Hughes wrote:
On 08/27/2014 09:32 PM, Nico Kadel-Garcia wrote:
On Wed, Aug 27, 2014 at 7:28 AM, Johnny Hughes johnny@centos.org wrote:
Not when the metadata is poisoned by a trojaned merge. Git logs can be edited. Without the GPG sums, it's like a web mirror that has a pack of RPM's with a pack of checksums alongside them. The owner of the mirror, or a cracker attacking the host, can corrupt *both*, and without the GPG tag, it's hard to get provenance.
And *that* is one of the points where having a GPG signed tag, especially one tied to the contents of the SRPM builds, becomes a a useful tool for verifying provenance of the tree. You can't rely on a binary comparison, there's likely to be frequent skew between the rsync mirrors and the main repo as a matter of course.
Red Hat does not want to provide us a gpg signed tag, so therefore we will not be getting one. No reason to keep bringing it up. Its not happening ant time soon.
I'm confused by this. What does Red Hat, at least the core business, have to do with this? You have a GPG key you use for making RPM's and SRPM's, why shouldn't or couldn't you use the same key to create git tags? This would be for tags for *your* work, and possibly for when you import Red Hat source.
We don't IMPORT the Red Hat source code ... Red Hat Engineering provides the Red Hat source code to the machine where git.centos.org lives. (they throw it over the wall that exists between the Red Hat Engineering team and the CentOS team).
Things that come in with "CentOS Sources" user (or earlier the "CentOS Buildsys" user) are not done by the CentOS team, they come from upstream. There is a specific user who is allowed to connect from a specific IP that has a specific key who can import code directly. I can not do it.
When these things come in, I see them the same way that the Scientific Linux team or anyone else who uses this source sees them, by checking the site. I then use the same tools that anyone else who wants to build the source code would use, the tools here:
https://git.centos.org/summary/centos-git-common.git
If they (upstream) gave us the SRPMs directly and we imported them, then we might have some say how they came in ... they do not and therefore we do not. Everyone who gets community source code from Red Hat gets it from git.centos.org .. INCLUDING the CentOS Team.
<snip>
To make sure this is understood, here is an example:
https://git.centos.org/log/rpms!cloud-init/refs!heads!c7-extras
That cloud-init import was done by user "Karanbir Singh" .. it has his name/user. If I did an import of an SRPM, it would be by my user.
If the user is CentOS Sources (or the earlier CentOS Buildsys) then it is coming from upstream. ======================================= Look at this one:
https://git.centos.org/log/rpms!libvpx.git/refs!heads!c7
All upstream commits. ====================================== And this one: https://git.centos.org/log/rpms!httpd/refs!heads!c7
There are upstream commits AND then we Roll in changes.
The git log shows what is upstream and what is changes by the team.
On Thu, Aug 28, 2014 at 7:09 AM, Johnny Hughes johnny@centos.org wrote:
On 08/27/2014 09:32 PM, Nico Kadel-Garcia wrote:
On Wed, Aug 27, 2014 at 7:28 AM, Johnny Hughes johnny@centos.org wrote:
Not when the metadata is poisoned by a trojaned merge. Git logs can be edited. Without the GPG sums, it's like a web mirror that has a pack of RPM's with a pack of checksums alongside them. The owner of the mirror, or a cracker attacking the host, can corrupt *both*, and without the GPG tag, it's hard to get provenance.
And *that* is one of the points where having a GPG signed tag, especially one tied to the contents of the SRPM builds, becomes a a useful tool for verifying provenance of the tree. You can't rely on a binary comparison, there's likely to be frequent skew between the rsync mirrors and the main repo as a matter of course.
Red Hat does not want to provide us a gpg signed tag, so therefore we will not be getting one. No reason to keep bringing it up. Its not happening ant time soon.
I'm confused by this. What does Red Hat, at least the core business, have to do with this? You have a GPG key you use for making RPM's and SRPM's, why shouldn't or couldn't you use the same key to create git tags? This would be for tags for *your* work, and possibly for when you import Red Hat source.
We don't IMPORT the Red Hat source code ... Red Hat Engineering provides the Red Hat source code to the machine where git.centos.org lives. (they throw it over the wall that exists between the Red Hat Engineering team and the CentOS team).
I'm referring to the git process of "importing" code, A git "import" is precisely what you just described. If it's not an import, then why is the word you use in your own logs "import" ? I'm staring at a typical sample at https://git.centos.org/commit/rpms!kernel.git/e7a209a421ed05cf6f96076363d7f0..., where the log message is "import kernel-3.10.0-123.1.2.el7".
Nothing prevents you from tagging, and signing, those imports and build versions to show that the relevant git content is indeed straight from the CentOS workflow. It is work.
Things that come in with "CentOS Sources" user (or earlier the "CentOS Buildsys" user) are not done by the CentOS team, they come from upstream. There is a specific user who is allowed to connect from a specific IP that has a specific key who can import code directly. I can not do it.
So he "imports" it directly into git? (See, there's that word again!!) Well and good.
When these things come in, I see them the same way that the Scientific Linux team or anyone else who uses this source sees them, by checking the site. I then use the same tools that anyone else who wants to build the source code would use, the tools here:
And you, or someone with write privileges to the git.centos.org repositories, can make tags and sign those when such "imports" occur. That would help ensure the provenance of code that is cloned, or rsynced, to secondary repositories.
In theory, it could even be done automatically by the build process used for release RPM's. Use the same revision data and package name that is currently in the "import" log message to make a signed tag, and developers who clone from yours or from remote mirrors will be able to be confident that the code actually came from CentOS, not from some nefarioius weasel's trojaned repository.
And unlike the current "import" log messages, the naming of tags can be made consistent. I've already noticed that the kernel log "import" messages sometimes list "kernel-number', and osmetimes "kernel-number.src.rpm". That makes processing the builds a bit confusing. You can always make a new, correctly named tag from an existing tag: that's very hard to fix in an already published log file.
It's not free: it involves actual work by someone with access to CentOS GPG tags and with privileges to modify the workflow. But please don't reject the concept of tagging and ensuring provenance on the basis that what is in place is secure enough. The loss of the GPG tagged SRPM access is an underlying security concern for developers whose nearest mirrors, or cloned git repos, may wind up poisoned.
On 08/27/2014 09:32 PM, Nico Kadel-Garcia wrote:
On Wed, Aug 27, 2014 at 7:28 AM, Johnny Hughes johnny@centos.org wrote:
Not when the metadata is poisoned by a trojaned merge. Git logs can be edited. Without the GPG sums, it's like a web mirror that has a pack of RPM's with a pack of checksums alongside them. The owner of the mirror, or a cracker attacking the host, can corrupt *both*, and without the GPG tag, it's hard to get provenance.
And *that* is one of the points where having a GPG signed tag, especially one tied to the contents of the SRPM builds, becomes a a useful tool for verifying provenance of the tree. You can't rely on a binary comparison, there's likely to be frequent skew between the rsync mirrors and the main repo as a matter of course.
Red Hat does not want to provide us a gpg signed tag, so therefore we will not be getting one. No reason to keep bringing it up. Its not happening ant time soon.
I'm confused by this. What does Red Hat, at least the core business, have to do with this? You have a GPG key you use for making RPM's and SRPM's, why shouldn't or couldn't you use the same key to create git tags? This would be for tags for *your* work, and possibly for when you import Red Hat source.
If you have to get your GPG keys from Red Hat, well, that belies the claims I've seen here that git.centos.org has no special relationship with Red Hat, that your git repo is "not special" and you have no special source code access, doesn't it? I'm sorry, but you can't have it both ways. I'm not suggesting you'd need a special Red Hat GPG tag for your imports, but rather have a tag that *you* at CentOS can apply to your repos and relevant tags.
I said the "CentOS Team" does not have any special access to Red Hat SRPMS ... We, as well as anyone else who wants to use git.centos.org, get code provided by Red Hat Engineering into git.centos.org. The original import is not done by the CentOS team, it is done by Red Hat. If they want to add a gpg signed tag, they can .. if they don't they won't. Once the code shows up, I see it just like you do.
We are not providing mirrors of this all over the place, we are quite happy with one location and backups/failover. What we are trying to provide is the ability for people who want a local mirror of this to be able to get it another way. This is a convenience only, not something that is required.
Right. And they're the ones I'm worried about provenance for. My working assumption is that they will not be as secure as, say, git.centos.org. And it's not trivial to do a side by side comparision, because the sites will have a lot of churn.
These local mirrors would not be CentOS mirrors at all, they are mirrors that some local user would be using for their own purpose, the CentOS team does not provide any assurance that those local mirrors are in any way accurate.
I am producing CentOS-7 directly with the git repo as it is RIGHT NOW, using absolutely nothing but the tools also provided in this repo and calls to mock. Fermi Scientific Linux is also producing their SL7 from this same git.centos.org repo, so this it is not a blocker to be able to mirror this to get the source code or produce binaries. All the tools are being provided or updated by the community and everything is open. It all works right now.
That's nice, but mostly irrelevant. It's all that's *available*, so they have to use it. The "analyze git logs to determine relevant revisions" has already broken down at least once that I saw reported here. The awkwardness of the "git log" analysis I've already gone over. If anyone else is going to access the contents form an rsync mirror, provenance becomes even more important.
We are not talking about providing official rsync mirrors, we are providing git.centos.org. If other people want to have a local copy, for their OWN USE (not in any way supported or recommended by CentOS) that is what we are talking about trying to provide.
So, if we can create a mechanism to mirror the content as well .. other than just a script to do it via the json API, then we will. This is not critical, obviously, as both CentOS and Scientific Linux are tracking EL7 and doing updates from git.centos.org just fine right now.
For CentOS, it's by choice and because you've alrady built it into your workflow. For Scientific Linux, it's because that's all that's available. Do you really think it's by *choice*, or by technical preference? I've seen no evidence of that.
It has nothing to do with my/our workflow .. These sources are provided by Red Hat as the official Red Hat community sources to git.centos.org, I have to use them just like everyone else, whether I like them or not.
What we are talking about also providing is a way for people to make local copies to help them if they want a local copy, nothing more.