[CentOS] Jigdo, etc...

Thu Dec 29 23:00:40 UTC 2005
Maciej Żenczykowski <maze at cela.pl>

Hi folks,

I've just finished rsyncing/downloading/jigdoizing the entire i386/x86_64 
CentOS 4.2 distribution.

If anyone is interested go to

http://mirror.tcs.ii.uj.edu.pl/jigdo/

You'll need to edit the .jigdo file by hand to change the server section

[Servers]
CentOS42=file:/opt/mirrors/centos/4.2/

to point to a local mirror (file, http or ftp), ie. to use kernel.org:

[Servers]
CentOS42=http://mirrors.kernel.org/centos/4.2/

While doing this I have come upon a few questions:

a) it seems the server cd's have a lot of stuff not present in the normal 
directory mirror, I guess this is an artifact of the build process?
[the template files for the servercd's are ~120MB]

b) what are the .newheaders and .repodata directories on i386 CD1?

c) why do the mirror repodata/*.xml.gz files not match neither the CD nor 
DVD versions for i386?

d) why does the i386 DVD not match ideally, but the x86_64 DVD matches 
for _all_ files.  The x86_64 CD1 also matches _much_ better than the i386 
CD1...

e) why aren't identical files between the two trees hardlinked?

$ ls -ali os/*/CentOS/RPMS/yum*noarch*
  278532 -rw-rw-r--  1 maze maze 395922 Sep  4 19:48 i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
1165388 -rw-rw-r--  1 maze maze 395922 Oct 10 22:20 x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm

$ md5sum os/*/CentOS/RPMS/yum*noarch*
371d55a19f8e4ca13d22974128ab4671  i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm
371d55a19f8e4ca13d22974128ab4671  x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm

Just an example of two identical files from my mirror, one of which is 
wasting space even though contents are identical.  I expect we have this 
situation for almost _all_ i386 packages from the x86_64 distribution...

$ pwd
/opt/mirrors/centos/4.2/os/x86_64
$ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
$ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c
440745010
$ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
$ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c
426816227

$ pwd
/opt/mirrors/centos/4.2/updates/x86_64
$ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
$ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c
12819616
$ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done
diff: ../i386/./RPMS/createrepo-0.4.3-1.noarch.rpm: No such file or directory
$ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c
2164495

$ ls RPMS/createrepo-0.4.3-1.noarch.rpm -al
-rw-rw-r--  2 maze maze 18284 Sep  5 13:59 RPMS/createrepo-0.4.3-1.noarch.rpm

That seems to me to be a 880 MB mirror space savings to be made there...
Considering the i386/x86_64 mirror takes up 7.7GB (without iso's) that's 
quite a bit...

I also imagine the noarch files are shared with most of the other 
architectures... so I'd assume another 400MB per every next arch can be 
saved...

f) Why aren't jigdo files available on the site?  They'd really come in 
useful, especially in the situation I had where I already had a complete 
mirror of all the files, but I still had to bittorrent the CD/DVD's even 
though I had 99% of the required data on disk!

Cheers,
MaZe.
-------------- next part --------------
-----=====----- CentOS 4.2 i386 CD1 -----=====-----
/CentOS/base/hdlist
/CentOS/base/hdlist2
/CentOS/base/hdstg2.img
/.discinfo
/headers/header.info
/isolinux/boot.cat
Only in 1: .newheaders
/repodata/filelists.xml.gz
/repodata/other.xml.gz
/repodata/primary.xml.gz
/repodata/repomd.xml
Only in 1: .repodata
-----=====----- CentOS 4.2 i386 CD2 -----=====-----
/.discinfo
-----=====----- CentOS 4.2 i386 CD3 -----=====-----
/.discinfo
-----=====----- CentOS 4.2 i386 CD4 -----=====-----
/.discinfo
-----=====----- CentOS 4.2 i386 DVD -----=====-----
/headers/header.info
/isolinux/boot.cat
/isolinux/isolinux.bin
/repodata/filelists.xml.gz
/repodata/other.xml.gz
/repodata/primary.xml.gz
/repodata/repomd.xml
-----=====----- CentOS 4.2 i386 ServerCD -----=====-----
/CentOS/base/comps.rpm
/CentOS/base/comps.xml
/CentOS/base/hdlist
/CentOS/base/hdlist2
/CentOS/base/hdstg2.img
/CentOS/base/netstg2.img
Only in Server/CentOS/base: product.img
/CentOS/base/stage2.img
Only in Server/CentOS/RPMS: anaconda-product-4.0-2.centos4.1.noarch.rpm
Only in Server/CentOS/RPMS: comps-4.2CENTOS-1.20051106.i386.rpm
Only in Server/CentOS/RPMS: rpmdb-CentOS-4.2-0.20051106.i386.rpm
/.discinfo
/images/boot.iso
/images/diskboot.img
/images/pxeboot/initrd.img
/images/pxeboot/README
/images/pxeboot/vmlinuz
/images/README
/isolinux/boot.cat
/isolinux/initrd.img
/isolinux/isolinux.bin
/isolinux/vmlinuz
/RELEASE-NOTES-en.html
Only in Server: RPM-GPG-KEY-CentOS-4
Only in Server: SRPMS
-------------- next part --------------
-----=====----- CentOS 4.2 x86_64 CD1 -----=====-----
/CentOS/base/hdlist
/CentOS/base/hdlist2
/.discinfo
/isolinux/boot.cat
/isolinux/isolinux.bin
-----=====----- CentOS 4.2 x86_64 CD2 -----=====-----
/.discinfo
-----=====----- CentOS 4.2 x86_64 CD3 -----=====-----
/.discinfo
-----=====----- CentOS 4.2 x86_64 CD4 -----=====-----
/.discinfo
-----=====----- CentOS 4.2 x86_64 DVD -----=====-----
-----=====----- CentOS 4.2 x86_64 ServerCD -----=====-----
/CentOS/base/comps.rpm
/CentOS/base/comps.xml
/CentOS/base/hdlist
/CentOS/base/hdlist2
/CentOS/base/hdstg2.img
/CentOS/base/netstg2.img
Only in Server/CentOS/base: product.img
/CentOS/base/stage2.img
Only in Server/CentOS/RPMS: comps-4.2CENTOS-0.20051123.x86_64.rpm
Only in Server/CentOS/RPMS: rpmdb-CentOS-4.2-0.20051123.x86_64.rpm
/.discinfo
/images/boot.iso
/images/diskboot.img
/images/pxeboot/initrd.img
/images/pxeboot/README
/isolinux/boot.cat
/isolinux/initrd.img
/isolinux/isolinux.bin
Only in Server: SRPMS