H. Peter Anvin wrote:
Brandon Davidson wrote:
AFAICT, hardlink.py checks a number of things to determine whether or not files are eligible for hardlinking:
- size is the same
- size is not zero
- file mode is the same
- owner user id is the same
- owner group id is the same
- modified time is the same (unless date hashing is disabled)
Does it actually compare the contents?
Yes, it looks like it does :) If it passes the above eligibility check, it then continues on to a raw block comparison:
buffer_size = 1024*1024 while 1: buffer1 = file1.read(buffer_size) buffer2 = file2.read(buffer_size) if buffer1 != buffer2: result = False break if not buffer1: result = True break
I don't claim to be a Python expert, but I don't see any obvious problems with this. There is of course the obvious window between the initial stat and the following block comparison in which the files could be modified, but it seems unlikely to be the cause of our problems in this case.
FYI, I'm looking here: http://code.google.com/p/hardlinkpy/source/browse/trunk/hardlink.py
-Brandon