On Thu, Aug 21, 2014 at 3:15 AM, Karanbir Singh mail-lists@karan.org wrote:
On 08/21/2014 02:29 AM, Nico Kadel-Garcia wrote:
Use GPG signed git tags to assure provenance, and the repository can be safely cloned. Rsyncing a git repo is like rsyncing a CVS or Subversion reository. Even small changes in the midst of the rsync operation can corrupt the underlying database.
the objects are checsum'd - the whole underlaying fabric of git is based on hash's, a corruption in content would be fairly easy to notice.
So what? It's a broken mirror, and in the midst of writes to individual repo at git.centos.org, each of *those* will be a broken mirror. Were you planning on setting up some kind of staging from the main repositoty to local rsync targets? That moves the problem upstream. and you'd need to use something like git clones and git pull to keep those safely up to date. But then we're back to being sure of the provenance of *those* repositories, and others will still be at risk of corrupting *those* when that target gets updated. Really, rsync based or filesystem based snapshots for anything with an underlying database all present the same kind of risks.
I assume you were planning on running 6000 distinct rsync mirror targets, one for each git repository, so I assume the damage would be isolated to only those repositories in the midst of update. How often are they going to be broken? While individually relatively stable repositories are likely to be intact, repositories that have a lot of churn are most at risk.
And checksums don't solve the provenance problem. Someone who maliciously p0wns a mirror site can trojan the site, and without something like GPG signed git tags, they content becomes very difficult to verify. it's theoretically possible to sit local working git clones to talk to several, distinct upstream, remote repositories and verify contents against them, but there will be frequent distinctions between git.centos.org and the rsync mirrors.