[CentOS] Jigdo, etc...

Wed Jan 4 12:59:21 UTC 2006
Johnny Hughes <mailing-lists at hughesjr.com>

On Tue, 2006-01-03 at 18:49 -0600, Johnny Hughes wrote:
> On Fri, 2005-12-30 at 00:00 +0100, Maciej Żenczykowski wrote:

> > 
> > e) why aren't identical files between the two trees hardlinked?
> > 
> > $ ls -ali os/*/CentOS/RPMS/yum*noarch*
> >   278532 -rw-rw-r--  1 maze maze 395922 Sep  4 19:48 i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> > 1165388 -rw-rw-r--  1 maze maze 395922 Oct 10 22:20 x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> > 
> > $ md5sum os/*/CentOS/RPMS/yum*noarch*
> > 371d55a19f8e4ca13d22974128ab4671  i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> > 371d55a19f8e4ca13d22974128ab4671  x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> > 
> > Just an example of two identical files from my mirror, one of which is 
> > wasting space even though contents are identical.  I expect we have this 
> > situation for almost _all_ i386 packages from the x86_64 distribution...
> > 
> 
> We run a program called hardlink++ on the master mirror that should hard
> link files that are identical.  If it is not hardlinking those it
> should.  
> 
> Are you using -H option on your rsyncing down?
> 
> > $ pwd
> > /opt/mirrors/centos/4.2/os/x86_64
> > $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> > $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c
> > 440745010
> > $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> > $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c
> > 426816227
> > 
> > $ pwd
> > /opt/mirrors/centos/4.2/updates/x86_64
> > $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> > $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c
> > 12819616
> > $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> > diff: ../i386/./RPMS/createrepo-0.4.3-1.noarch.rpm: No such file or directory
> > $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c
> > 2164495
> > 
> > $ ls RPMS/createrepo-0.4.3-1.noarch.rpm -al
> > -rw-rw-r--  2 maze maze 18284 Sep  5 13:59 RPMS/createrepo-0.4.3-1.noarch.rpm
> > 
> > That seems to me to be a 880 MB mirror space savings to be made there...
> > Considering the i386/x86_64 mirror takes up 7.7GB (without iso's) that's 
> > quite a bit...
> > 
> > I also imagine the noarch files are shared with most of the other 
> > architectures... so I'd assume another 400MB per every next arch can be 
> > saved...
> 
> One thing to please remember is that we develop these files from
> separate locations on separate machines, so they have to be stand alone
> on those machines initially ... we then combine them together on the
> mirror and run hardlink++.  That SHOULD hardlink all the files that are
> the same.

OK ... have done some specific testing, I have found out this about
hardlink++

It only links files that have the same date/time stamp ... which means
if a file has the same size and MD5 sum but a different date, it will
not get linked.  This is not what I thought it did.

I will try to get the arches I control (i386 / x86_64) better hardlinked
in the future and try to maintain them that way, since what I thought
the hardlink++ was doing, it is not.  However, there are only so many
hours in the day.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.centos.org/pipermail/centos/attachments/20060104/f192ad73/attachment-0004.sig>