[CentOS] Jigdo, etc...
Johnny Hughes
mailing-lists at hughesjr.com
Wed Jan 4 12:59:21 UTC 2006
On Tue, 2006-01-03 at 18:49 -0600, Johnny Hughes wrote:
> On Fri, 2005-12-30 at 00:00 +0100, Maciej Żenczykowski wrote:
> >
> > e) why aren't identical files between the two trees hardlinked?
> >
> > $ ls -ali os/*/CentOS/RPMS/yum*noarch*
> > 278532 -rw-rw-r-- 1 maze maze 395922 Sep 4 19:48 i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> > 1165388 -rw-rw-r-- 1 maze maze 395922 Oct 10 22:20 x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> >
> > $ md5sum os/*/CentOS/RPMS/yum*noarch*
> > 371d55a19f8e4ca13d22974128ab4671 i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> > 371d55a19f8e4ca13d22974128ab4671 x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> >
> > Just an example of two identical files from my mirror, one of which is
> > wasting space even though contents are identical. I expect we have this
> > situation for almost _all_ i386 packages from the x86_64 distribution...
> >
>
> We run a program called hardlink++ on the master mirror that should hard
> link files that are identical. If it is not hardlinking those it
> should.
>
> Are you using -H option on your rsyncing down?
>
> > $ pwd
> > /opt/mirrors/centos/4.2/os/x86_64
> > $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> > $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c
> > 440745010
> > $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> > $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c
> > 426816227
> >
> > $ pwd
> > /opt/mirrors/centos/4.2/updates/x86_64
> > $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> > $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c
> > 12819616
> > $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> > diff: ../i386/./RPMS/createrepo-0.4.3-1.noarch.rpm: No such file or directory
> > $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c
> > 2164495
> >
> > $ ls RPMS/createrepo-0.4.3-1.noarch.rpm -al
> > -rw-rw-r-- 2 maze maze 18284 Sep 5 13:59 RPMS/createrepo-0.4.3-1.noarch.rpm
> >
> > That seems to me to be a 880 MB mirror space savings to be made there...
> > Considering the i386/x86_64 mirror takes up 7.7GB (without iso's) that's
> > quite a bit...
> >
> > I also imagine the noarch files are shared with most of the other
> > architectures... so I'd assume another 400MB per every next arch can be
> > saved...
>
> One thing to please remember is that we develop these files from
> separate locations on separate machines, so they have to be stand alone
> on those machines initially ... we then combine them together on the
> mirror and run hardlink++. That SHOULD hardlink all the files that are
> the same.
OK ... have done some specific testing, I have found out this about
hardlink++
It only links files that have the same date/time stamp ... which means
if a file has the same size and MD5 sum but a different date, it will
not get linked. This is not what I thought it did.
I will try to get the arches I control (i386 / x86_64) better hardlinked
in the future and try to maintain them that way, since what I thought
the hardlink++ was doing, it is not. However, there are only so many
hours in the day.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.centos.org/pipermail/centos/attachments/20060104/f192ad73/attachment.sig>
More information about the CentOS
mailing list