I've gone through the trouble (after some 10 rounds I finally have something I'm happy with) and generated jigdo/template files for CentOS 5.0 iso's (CD/DVD i386/x86_64) (and CentOS 4.4 while I was at it: CD/DVD/ServerCD SRPMS/i386/x86_64).
(and I started on this before the thread about creating the DVD's from the CD's started ;-) )
They are available at: http://tcs.uj.edu.pl/~buildcentos/centos-jigdo/
(I skipped the LiveCD since jigdo offers no size improvement here -- already for the ServerCD the template file is 125 MBs because of the different .img files inside the iso)
These jigdo/template files should allow you to generate any of the listed cd/dvd isos from the publically available http://mirror.centos.org mirror site (which is DNS-aliased to something close to you). Jigdo will also allow using a local source of files (ie. an existing CD, etc...)
You will need to install jigdo-file and jigdo-lite.
It is available: - in package jigdo from dag's repository (rpmforge) for CentOS 4 (and others) - in package jigdo-file for debian/ubuntu (just apt-get it) - from the source website (both sources, and windows binaries, and statically linked linux binaries) http://atterer.net/jigdo/
Once installed usage is (...should be...) trivial:
jigdo-lite --noask http://tcs.uj.edu.pl/~buildcentos/centos-jigdo/%.jigdo
which will generate the requested (replace the % with what you want, see the http://tcs.uj.edu.pl/~buildcentos/centos-jigdo/ listing) iso file in the directory in which you run it.
[If you have a local source of files you can skip the --noask and provide it when asked]
Next step: get this included on the centos mirrors ;-) [hint hint]
Afterwards: it would be nice to drop the iso files from the mirrors in their entirety and generate them on-the-fly using the JTE extensions package. Should be significant space-savings. [requires perl cgi in the hosting webserver, but should be very safe since it's all read-only access]
Even later: it would be awesome if we could generate the jigdo/template during mkisofs (requires using a patched mkisofs). This would skip the slightly painful jigdo generation step (painful - because I have to download a local mirror of both the files and the CD/DVD images, much less painful to do it directly on the centos source/build system/mirrors)
Both later steps are 95% there (the JTE jigit website/extensions/mkisofs patches/etc) and mostly require putting together rpms for EL4/5. (I'm going to be trying to put together a mkisofs-jte rpm which doesn't override the normal mkisofs rpm, but instead provides two files - /usr/bin/mkisofs-jte and it's manpage)
This should also allow more customized images (like the rolled up updates) without increasing mirror space usage.
Jigdo (combined with hardlinking identical files, which I believe is already done) should enable drastic reductions in centos HDD footprint.
Comments?
Cheers, Maciej Zenczykowski
Maciej Zenczykowski wrote:
I've gone through the trouble (after some 10 rounds I finally have something I'm happy with) and generated jigdo/template files for CentOS 5.0 iso's (CD/DVD i386/x86_64) (and CentOS 4.4 while I was at it: CD/DVD/ServerCD SRPMS/i386/x86_64).
thanks for taking the time to get this done. however, one thing I'd like to point out is that there is now a -qa testing team in place, and issues such as this should really be run past them ( posting in centos-devel usually get their attention ), before posting it on the centos list or the public forums.
If its ok with you, and if you are happy to work with the CentOS developers and qa people to make this jidgo process more 'official' - could we move this to the centos-devel list and let the -qa people have a go at it for a bit ?
- KB
Karanbir Singh wrote:
Maciej Zenczykowski wrote:
I've gone through the trouble (after some 10 rounds I finally have something I'm happy with) and generated jigdo/template files for CentOS 5.0 iso's (CD/DVD i386/x86_64) (and CentOS 4.4 while I was at it: CD/DVD/ServerCD SRPMS/i386/x86_64).
thanks for taking the time to get this done. however, one thing I'd like to point out is that there is now a -qa testing team in place, and issues such as this should really be run past them ( posting in centos-devel usually get their attention ), before posting it on the centos list or the public forums.
One doesn't need at all to go to CentOS QA to do something private any more than CentOS needs to run it past Red Hat's qa.
If its ok with you, and if you are happy to work with the CentOS developers and qa people to make this jidgo process more 'official' - could we move this to the centos-devel list and let the -qa people have a go at it for a bit ?
OTOH now it's done, I'd love it to be an official part of CentOS.
John Summerfield wrote:
One doesn't need at all to go to CentOS QA to do something private any more than CentOS needs to run it past Red Hat's qa.
perhaps you dont, but I like to know that the systems and process I am using ( and centos is going to provide officially ) have gone through some testing process.
OTOH now it's done, I'd love it to be an official part of CentOS.
... and that is not going to happen unless it goes through the -qa and testing people first.
If the users are going to trust us to provide them with a service, we need to make sure that the trust is based on some facts and at-least some form of testing.
- KB
On Tue, 17 Apr 2007, Karanbir Singh wrote:
John Summerfield wrote:
One doesn't need at all to go to CentOS QA to do something private any more than CentOS needs to run it past Red Hat's qa.
perhaps you dont, but I like to know that the systems and process I am using ( and centos is going to provide officially ) have gone through some testing process.
As would I :-) Although I did test these, however I am only one man, and can't test it in every conceivable situation ;-)
OTOH now it's done, I'd love it to be an official part of CentOS.
... and that is not going to happen unless it goes through the -qa and testing people first.
Cool.
If the users are going to trust us to provide them with a service, we need to make sure that the trust is based on some facts and at-least some form of testing.
Agreed,
Maciej
I've gone through the trouble (after some 10 rounds I finally have something I'm happy with) and generated jigdo/template files for CentOS 5.0 iso's (CD/DVD i386/x86_64) (and CentOS 4.4 while I was at it: CD/DVD/ServerCD SRPMS/i386/x86_64).
thanks for taking the time to get this done. however, one thing I'd like to point out is that there is now a -qa testing team in place, and issues such as this should really be run past them ( posting in centos-devel usually get their attention ), before posting it on the centos list or the public forums.
Sure, I'll repost the original topic in centos-devel in a moment (just have to sign up...)
If its ok with you, and if you are happy to work with the CentOS developers and qa people to make this jidgo process more 'official' - could we move this to the centos-devel list and let the -qa people have a go at it for a bit ?
Cool. See you there.
Maciej.
Maciej Zenczykowski spake the following on 4/15/2007 9:46 PM:
I've gone through the trouble (after some 10 rounds I finally have something I'm happy with) and generated jigdo/template files for CentOS 5.0 iso's (CD/DVD i386/x86_64) (and CentOS 4.4 while I was at it: CD/DVD/ServerCD SRPMS/i386/x86_64).
(and I started on this before the thread about creating the DVD's from the CD's started ;-) )
They are available at: http://tcs.uj.edu.pl/~buildcentos/centos-jigdo/
(I skipped the LiveCD since jigdo offers no size improvement here -- already for the ServerCD the template file is 125 MBs because of the different .img files inside the iso)
These jigdo/template files should allow you to generate any of the listed cd/dvd isos from the publically available http://mirror.centos.org mirror site (which is DNS-aliased to something close to you). Jigdo will also allow using a local source of files (ie. an existing CD, etc...)
You will need to install jigdo-file and jigdo-lite.
It is available:
- in package jigdo from dag's repository (rpmforge) for CentOS 4 (and
others)
- in package jigdo-file for debian/ubuntu (just apt-get it)
- from the source website (both sources, and windows binaries, and
statically linked linux binaries) http://atterer.net/jigdo/
Once installed usage is (...should be...) trivial:
jigdo-lite --noask http://tcs.uj.edu.pl/~buildcentos/centos-jigdo/%.jigdo
which will generate the requested (replace the % with what you want, see the http://tcs.uj.edu.pl/~buildcentos/centos-jigdo/ listing) iso file in the directory in which you run it.
[If you have a local source of files you can skip the --noask and provide it when asked]
Next step: get this included on the centos mirrors ;-) [hint hint]
Afterwards: it would be nice to drop the iso files from the mirrors in their entirety and generate them on-the-fly using the JTE extensions package. Should be significant space-savings. [requires perl cgi in the hosting webserver, but should be very safe since it's all read-only access]
Even later: it would be awesome if we could generate the jigdo/template during mkisofs (requires using a patched mkisofs). This would skip the slightly painful jigdo generation step (painful - because I have to download a local mirror of both the files and the CD/DVD images, much less painful to do it directly on the centos source/build system/mirrors)
Both later steps are 95% there (the JTE jigit website/extensions/mkisofs patches/etc) and mostly require putting together rpms for EL4/5. (I'm going to be trying to put together a mkisofs-jte rpm which doesn't override the normal mkisofs rpm, but instead provides two files - /usr/bin/mkisofs-jte and it's manpage)
This should also allow more customized images (like the rolled up updates) without increasing mirror space usage.
Jigdo (combined with hardlinking identical files, which I believe is already done) should enable drastic reductions in centos HDD footprint.
Comments?
Bitorrent has one advantage over jigdo AFAIR -- Jigdo uses the bandwidth of the mirrors to get all downloads, but bittorrent uses the bandwidth of the downloaders. This adds up to significant savings to the one hosting the files, because storage is an occasional purchase, where bandwidth is a monthly expense.
Scott Silva wrote:
Bitorrent has one advantage over jigdo AFAIR -- Jigdo uses the bandwidth of the mirrors to get all downloads, but bittorrent uses the bandwidth of the downloaders. This adds up to significant savings to the one hosting the files, because storage is an occasional purchase, where bandwidth is a monthly expense.
Ask internet access providers what they prefer; here, they commonly set up local mirrors for their users. Jigdo will use these, and the user's choice.
OTOH bitttorrent will get stuff from uncontrolled locations and IAPs pay time and again again.
I can't imagine bittorrent provides _any_ advantage over jigdo plus one's IAP's mirror. I am in Perth, Western Australia, where I can choose between (at least) two local mirrors.
And Jigdo would have been enormously helpful in updating beta to final (provided packages were not needlessly rebuilt), all the unchanged packages could be used from the beta images.
I can also use it to construct a DVD image from my CDs.
John Summerfield spake the following on 4/16/2007 4:11 PM:
Scott Silva wrote:
Bitorrent has one advantage over jigdo AFAIR -- Jigdo uses the bandwidth of the mirrors to get all downloads, but bittorrent uses the bandwidth of the downloaders. This adds up to significant savings to the one hosting the files, because storage is an occasional purchase, where bandwidth is a monthly expense.
Ask internet access providers what they prefer; here, they commonly set up local mirrors for their users. Jigdo will use these, and the user's choice.
OTOH bitttorrent will get stuff from uncontrolled locations and IAPs pay time and again again.
I can't imagine bittorrent provides _any_ advantage over jigdo plus one's IAP's mirror. I am in Perth, Western Australia, where I can choose between (at least) two local mirrors.
And Jigdo would have been enormously helpful in updating beta to final (provided packages were not needlessly rebuilt), all the unchanged packages could be used from the beta images.
I can also use it to construct a DVD image from my CDs.
I was just thinking of the bandwidth that the CentOS team doesn't have to pay for with bittorrent. But I do agree that jigdo is a better system for the end-user and for those with pricey last-mile connections, or you in Perth, where you probably have to pay for anything that crosses the ocean.
Scott Silva wrote:
I was just thinking of the bandwidth that the CentOS team doesn't have to pay for with bittorrent. But I do agree that jigdo is a better system for the end-user and for those with pricey last-mile connections, or you in Perth, where you probably have to pay for anything that crosses the ocean.
also need to keep in mind that if its going to hit .centos.org machines, its going to come atleast a few days later than the .torrent - since we tend to get hit real hard by people using yum and doing net installs as well.
btw, I've not seen a post by Maciej indicating he would be interested in moving this to centos-devel and some people testing it out...
- KB
also need to keep in mind that if its going to hit .centos.org machines, its going to come atleast a few days later than the .torrent - since we tend to get hit real hard by people using yum and doing net installs as well.
So ideally the best solution here is to have some sort of DNS resolving in place so that mirror.centos.org always resolves to your closest in-sync mirror -- I was under the impression that we already have this for yum???
btw, I've not seen a post by Maciej indicating he would be interested in moving this to centos-devel and some people testing it out...
Upcoming :)
On Tue, 17 Apr 2007, Maciej Zenczykowski wrote:
also need to keep in mind that if its going to hit .centos.org machines, its going to come atleast a few days later than the .torrent - since we tend to get hit real hard by people using yum and doing net installs as well.
So ideally the best solution here is to have some sort of DNS resolving in place so that mirror.centos.org always resolves to your closest in-sync mirror -- I was under the impression that we already have this for yum???
But --- does jigdo have any kind of indirection in place , eg to get a list of mirrors and use them ??
Or could yum be used as a wrapper around jigdo maybe to pull stuff down using fastestmirror or maybe even dags stuff ??
Otherwise it gets messy because mirrors have different file structures, so just using CNAME's wont work.
The trouble I see is that if a user doesnt have a local mirror or local copy of a set if isos then downloading the packages individually from a mirror may be less efficient than downloading an iso / set of isos, and certainly without mirror redirect and fastestmirror it is a non starter ...
The fedora guys have beern working on stuff that we also ought to look at , and we should look at metalinks while we are at it.
Regards Lance
-- uklinux.net - The ISP of choice for the discerning Linux user.
Scott Silva wrote:
I was just thinking of the bandwidth that the CentOS team doesn't have to pay for with bittorrent.
indeed. As of 7:35pm PDT Monday, the CentOS5 torrents have downloaded...
5.0 i386 DVD - 10679 times, 36.1 terabytes 5.0 x86_64 DVD - 3422 times, 13.34 terabytes 5.0 i386 6 CD set - 1958 times, 6.62 terabytes 5.0 x86_64 7 CD set - 379 times, 1.47 terabytes
grand total: 16438 complete downloads, and 57.5 terabytes.
thats a LOTTA transit bandwidth taken off of the mirrors. This is entirely since Thursday AM, i.e. in the last ~ 4 days.
note they use the decimal version of tera here, or 1,000,000,000,000
John R Pierce wrote:
Scott Silva wrote:
I was just thinking of the bandwidth that the CentOS team doesn't have to pay for with bittorrent.
indeed. As of 7:35pm PDT Monday, the CentOS5 torrents have downloaded...
5.0 i386 DVD - 10679 times, 36.1 terabytes 5.0 x86_64 DVD - 3422 times, 13.34 terabytes 5.0 i386 6 CD set - 1958 times, 6.62 terabytes 5.0 x86_64 7 CD set - 379 times, 1.47 terabytes
grand total: 16438 complete downloads, and 57.5 terabytes.
thats a LOTTA transit bandwidth taken off of the mirrors. This is entirely since Thursday AM, i.e. in the last ~ 4 days.
note they use the decimal version of tera here, or 1,000,000,000,000
Some of that is available to people here from 3fl.net and probably ftp.iinet.net.au, but because they won't serve (they pay for that too) the world you won't help us find them.
Do you seriously want to reduce the load? It's in your hands.
And before you try it, I don't think the Ubuntu scheme (testing the for the fastest link) works, and nor, as far as I can see, does geo ip. I'm on WAIX, but WAIX isn't good for everyone in Perth, Western Australia.
Scott Silva wrote:
I was just thinking of the bandwidth that the CentOS team doesn't have to pay for with bittorrent. But I do agree that jigdo is a better system for the end-user and for those with pricey last-mile connections, or you in Perth, where you probably have to pay for anything that crosses the ocean.
Our local mirrors are even better for CentOS servers. Even though they won't server the world, CentOS would do well to work with them to make them easy to find.
John Summerfield wrote:
Scott Silva wrote:
I was just thinking of the bandwidth that the CentOS team doesn't have to pay for with bittorrent. But I do agree that jigdo is a better system for the end-user and for those with pricey last-mile connections, or you in Perth, where you probably have to pay for anything that crosses the ocean.
Our local mirrors are even better for CentOS servers. Even though they won't server the world, CentOS would do well to work with them to make them easy to find.
Will the yum in Centos5 work with caching proxies for updates like it did in Centos3 or will it come up with a different URL from each machine and defeat the cache like the Centos4 does?
Our local mirrors are even better for CentOS servers. Even though they won't server the world, CentOS would do well to work with them to make them easy to find.
What we really need is something IP based. Look up your IP on http://www.centos.org/whatismyip.php (content: <? echo $_SERVER['REMOTE_ADDR']; ?>) [Example at: http://tcs.uj.edu.pl/~maze/whatismyip.php] Use this IP (or your public IP if you already have one) and look it up in a netmask'ed list of mirrors, something along the lines of:
149.156.81.192/29 999 http://mirror.tcs.uj.edu.pl/centos/
Where the first specifies network ip range, the second is a priority (this should be something like bandwidth from mirror to destination network) and the third is the mirror location centos root directory.
Anyway a client fetches: http://www.centos.org/auto-mirrors.php and gets a list of all the above lines which matched for it's given IP (ie. the REMOTE_ADDR). We can return only the actual mirror path - sorted by decreasing priority (ie. bandwidth).
Then we'd have to ask people to submit lines of the above form for any 'close by' networks.
This might be a bit of an administrative headache though...
(and there'es still the issue of how to deal with partial mirrors... my suggestion would be to allow mirroring on the version & architecture level [as in I have 4.4 i386, 4.4 SRPMS, 5.0 x86_64, 5.0 SRPMS].
Could use a little more polish... but wondering about any first comments?
Maciej
Maciej Zenczykowski wrote:
Our local mirrors are even better for CentOS servers. Even though they won't server the world, CentOS would do well to work with them to make them easy to find.
What we really need is something IP based. Look up your IP on http://www.centos.org/whatismyip.php (content: <? echo $_SERVER['REMOTE_ADDR']; ?>) [Example at: http://tcs.uj.edu.pl/~maze/whatismyip.php] Use this IP (or your public IP if you already have one) and look it up in a netmask'ed list of mirrors, something along the lines of:
149.156.81.192/29 999 http://mirror.tcs.uj.edu.pl/centos/
Where the first specifies network ip range, the second is a priority (this should be something like bandwidth from mirror to destination network) and the third is the mirror location centos root directory.
Anyway a client fetches: http://www.centos.org/auto-mirrors.php and gets a list of all the above lines which matched for it's given IP (ie. the REMOTE_ADDR). We can return only the actual mirror path - sorted by decreasing priority (ie. bandwidth).
Then we'd have to ask people to submit lines of the above form for any 'close by' networks.
This might be a bit of an administrative headache though...
(and there'es still the issue of how to deal with partial mirrors... my suggestion would be to allow mirroring on the version & architecture level [as in I have 4.4 i386, 4.4 SRPMS, 5.0 x86_64, 5.0 SRPMS].
Could use a little more polish... but wondering about any first comments?
Until CentOS recognises all the local mirrors, and from my perspective, it's not worth the paper it's written on. So far, the schemes I see give me places on the other side of the continent and those are little if any better than downloading direct from www.centos.org (except they might be faster).
John Summerfield wrote:
I can also use it to construct a DVD image from my CDs.
you have repeatedly made this statement, which I dont believe for a second - Mainly due to the fact that I dont think jidgo has any smartness w.r.t repometadata or pkgordering.
So what am I missing here ? does jidgo really do a new buildinstall and rerun hd-metadata in the distro ?
Karanbir Singh wrote:
John Summerfield wrote:
I can also use it to construct a DVD image from my CDs.
So what am I missing here ? does jidgo really do a new buildinstall and rerun hd-metadata in the distro ?
Lance just pointed out to me tht jidgo works using a rdiff like logic, which makes sense I suppose.
- KB
I can also use it to construct a DVD image from my CDs.
So what am I missing here ? does jidgo really do a new buildinstall and rerun hd-metadata in the distro ?
Lance just pointed out to me tht jidgo works using a rdiff like logic, which makes sense I suppose.
To be precise it only actually cares about the files on a CD (ie. it doesn't use the iso image - it uses the loopback-mounted iso image just like if it was another directory on your hard disk).
Jigdo basically splits any (big) file into pieces which it can't find anywhere (these it sticks in the template file), and the pieces which are whole files taken from somewhere else (these then just get embedded as ptrs in the jigdo/template combination with Base64 MD5 checksums to ensure correctness). And the entire image is MD5 checksummed. So all the stuff that is unique to the iso image (cd or dvd) ends up in the template (ie. the iso directory structure, etc...).
For example based on the size of the jigdo template files you can guess that while most of the centos/x.x/os/arch directories where copies of the DVDs, the 4.4 i386 directory is a copy of the CDs. (you can tell this by comparing the size of the template files for the CD DVD editions, smaller means closer match). The CD and DVD editions are slightly different because they differ in the locations of files on the media (ie. all on one vs. all on many), so the file contents listing files are different.
Anyway - enough of this - I've been writing to much...
Maciej
Karanbir Singh wrote:
John Summerfield wrote:
I can also use it to construct a DVD image from my CDs.
you have repeatedly made this statement, which I dont believe for a second - Mainly due to the fact that I dont think jidgo has any smartness w.r.t repometadata or pkgordering.
So what am I missing here ? does jidgo really do a new buildinstall and rerun hd-metadata in the distro ?
Then you really should educate yourself by, at the very least, reading its documentation.