[CentOS] Checksums for git repo content?

Thu Feb 9 21:04:18 UTC 2017

On Feb 9, 2017, at 1:26 PM, Leonard den Ottolander <leonard at den.ottolander.nl> wrote:
> 
> On Thu, 2017-02-09 at 14:12 -0600, Johnny Hughes wrote:
>> The patch files are in git as text files, right?  Why would you need
>> checksums of those? That is the purpose of git, right?
> 
> Checksums are there to make sure that you get what you are supposed to
> get.

What failure model are you trying to solve for, specifically?

If you’re worried about malicious tampering of the files on the server, how would your request solve anything?  If you don’t trust the Git repo you’re cloning from, why would you trust a checksum file stored in that same repo?

If you’re worried about a MITM attack, any MITM that can modify Git data in-flight can produce bogus checksum files in-flight, too.

If you’re worried about corrupted data at rest on the remote server or corruption introduced during the transfer, Git already solves this:

   https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

If you want to verify that a given Git clone is consistent:

   $ git fsck --full —strict 

Git can do this because its contents are a type of Merkle tree:

   https://en.wikipedia.org/wiki/Merkle_tree

Merkle trees are highly resistant to attacks, particularly in the case of source code, where an attack must not only change the attacked resource, the change has to a) create some effect desired by the attacker; and b) still be legal code in the programming language being used.  Getting both effects while still maintaining the same SHA1 hash is Difficult.™

I don’t know Git internals, but I would expect the above git-fsck command to be pointless immediately after a clone, because Git should be doing something like what it does during the clone process.  (I’ve been disappointed by Git’s behavior before, though, so…)

That command should only have a useful effect after a later git pull command in order to detect whether the local copy has bitrotted in the meantime.

> Having checksums for all files (like in a SRPM) is a guarantee

A checksum guarantees nothing by itself.  A file’s checksum is only as trustworthy as the source of that checksum.  If you don’t trust the source to give you a correct file, you can’t trust that same source to give you a valid checksum.  Any bad actor that can compromise one can compromise the other.

*Distributed* checksums can sometimes be helpful, if they’re maintained by disparate parties on distributed servers.  Here, you’re asking some third party to assert that they got a copy of the same RPM (or whatever) and that they got checksum XXXXXXX for it.  That devolves into a trust relationship, rather than the math problem it naively looks like: do you trust that party not to be compromised by the same party that produced the RPM in question?

Another trust problem — which is again a people problem rather than a math problem — is cryptographic signatures.  A signed SRPM is only as trustworthy as the provider of the signing certificate.  Certificate authorities are getting caught doing untrustworthy things *all the time*.  Have you vetted your trusted CAs, or are you relying on a third party to do that?  Why do you trust that third party to do that job thoroughly?