[CentOS] Jigdo, etc...

Wed Jan 4 00:49:59 UTC 2006
Johnny Hughes <mailing-lists at hughesjr.com>

On Fri, 2005-12-30 at 00:00 +0100, Maciej Żenczykowski wrote:
> Hi folks,
> 
> I've just finished rsyncing/downloading/jigdoizing the entire i386/x86_64 
> CentOS 4.2 distribution.
> 
> If anyone is interested go to
> 
> http://mirror.tcs.ii.uj.edu.pl/jigdo/
> 
> You'll need to edit the .jigdo file by hand to change the server section
> 
> [Servers]
> CentOS42=file:/opt/mirrors/centos/4.2/
> 
> to point to a local mirror (file, http or ftp), ie. to use kernel.org:
> 
> [Servers]
> CentOS42=http://mirrors.kernel.org/centos/4.2/
> 
> While doing this I have come upon a few questions:
> 
> a) it seems the server cd's have a lot of stuff not present in the normal 
> directory mirror, I guess this is an artifact of the build process?
> [the template files for the servercd's are ~120MB]
> 
> b) what are the .newheaders and .repodata directories on i386 CD1?
> 
> c) why do the mirror repodata/*.xml.gz files not match neither the CD nor 
> DVD versions for i386?

There was an issue after tree dissemination that required yum-arch and
createrepo to be run again on the main tree.  This may happen from time
to time due to mirror rsync issues.

> 
> d) why does the i386 DVD not match ideally, but the x86_64 DVD matches 
> for _all_ files.  The x86_64 CD1 also matches _much_ better than the i386 
> CD1...

There was a need to rerun the yum-arch and createrepo on the tree after
the ISOs were released ... that may or may not be the cause of the
differences.  However, from a yum and up2date prespective, the i386
tree, DVD, and CD set are the same.

Did I mention that we don't have 5 million dollars or 500 programmers to
produce centos.  All the trees and mirrors are donated ... and all the
developers donate their time and machines to make this happen.

I do the best job I can to make this a good and FREE distro, as do all
the other devels.

> 
> e) why aren't identical files between the two trees hardlinked?
> 
> $ ls -ali os/*/CentOS/RPMS/yum*noarch*
>   278532 -rw-rw-r--  1 maze maze 395922 Sep  4 19:48 i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> 1165388 -rw-rw-r--  1 maze maze 395922 Oct 10 22:20 x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> 
> $ md5sum os/*/CentOS/RPMS/yum*noarch*
> 371d55a19f8e4ca13d22974128ab4671  i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> 371d55a19f8e4ca13d22974128ab4671  x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
> 
> Just an example of two identical files from my mirror, one of which is 
> wasting space even though contents are identical.  I expect we have this 
> situation for almost _all_ i386 packages from the x86_64 distribution...
> 

We run a program called hardlink++ on the master mirror that should hard
link files that are identical.  If it is not hardlinking those it
should.  

Are you using -H option on your rsyncing down?

> $ pwd
> /opt/mirrors/centos/4.2/os/x86_64
> $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c
> 440745010
> $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c
> 426816227
> 
> $ pwd
> /opt/mirrors/centos/4.2/updates/x86_64
> $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c
> 12819616
> $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
> diff: ../i386/./RPMS/createrepo-0.4.3-1.noarch.rpm: No such file or directory
> $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c
> 2164495
> 
> $ ls RPMS/createrepo-0.4.3-1.noarch.rpm -al
> -rw-rw-r--  2 maze maze 18284 Sep  5 13:59 RPMS/createrepo-0.4.3-1.noarch.rpm
> 
> That seems to me to be a 880 MB mirror space savings to be made there...
> Considering the i386/x86_64 mirror takes up 7.7GB (without iso's) that's 
> quite a bit...
> 
> I also imagine the noarch files are shared with most of the other 
> architectures... so I'd assume another 400MB per every next arch can be 
> saved...

One thing to please remember is that we develop these files from
separate locations on separate machines, so they have to be stand alone
on those machines initially ... we then combine them together on the
mirror and run hardlink++.  That SHOULD hardlink all the files that are
the same.

> 
> f) Why aren't jigdo files available on the site?  They'd really come in 
> useful, especially in the situation I had where I already had a complete 
> mirror of all the files, but I still had to bittorrent the CD/DVD's even 
> though I had 99% of the required data on disk!

I don't know how to do jigdo files ... however, I am willing to learn.

Fedora and Redhat don't, to my knowledge, create or distribute jigdo
files ... so this is not something that we would normally do.

There are lots of things that we don't do ... maybe we need 48 hour
days :)

I am willing to learn what jigdo is all about ... but for now I am
totally ignorant.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.centos.org/pipermail/centos/attachments/20060103/10f5d908/attachment-0005.sig>