I figured I try if I can mirror the base and updates repos locally. There's no tutorial for that, only one about creating your own repo of packages which is not the same. So, I just mirrored all the stuff with wget and changed the baseurl in the repo files and hoped that's enough. Works. So easy you don't need a tutorial. *But* I then realized that the updates directory contains *all* updates, not just the latest. Which means if I don't regularly check I may get old versions mirrored I don't want. It also means that I get a lot of unwanted files at the time I start to mirror. And I cannot delete old files as these would again be mirrored in. An obvious solution would be to check each day and tell wget (or whatever software I use) to ignore files older than 24 hours. Still, this means the initial download has to get them all and I have to delete all unwanted old files manually. Is there a better solution?
Kai
Kai Schaetzl wrote:
I figured I try if I can mirror the base and updates repos locally. There's no tutorial for that, only one about creating your own repo of packages which is not the same. So, I just mirrored all the stuff with wget and changed the baseurl in the repo files and hoped that's enough. Works. So easy you don't need a tutorial. *But* I then realized that the updates directory contains *all* updates, not just the latest. Which means if I don't regularly check I may get old versions mirrored I don't want. It also means that I get a lot of unwanted files at the time I start to mirror. And I cannot delete old files as these would again be mirrored in. An obvious solution would be to check each day and tell wget (or whatever software I use) to ignore files older than 24 hours. Still, this means the initial download has to get them all and I have to delete all unwanted old files manually. Is there a better solution?
Yes a lot of past versions are kept in the repo, but if you filter those out then it wouldn't be a "mirror" then?
Be careful with the "repomd.xml" file, delete it before starting to mirror to make sure it doesn't get out-of-sync, because, at least on my end, I find that wget via http doesn't compare time stamps just sizes/names and the sha1 checksums are always the same length.
Here's a little script:
#!/bin/sh
VERSION=5.0
mkdir -p /Software/CentOS/$VERSION/updates/i386 >/dev/null 2>&1 mkdir -p /Software/CentOS/$VERSION/updates/x86_64 >/dev/null 2>&1
rm -f /Software/CentOS/$VERSION/updates/i386/repodata/repomd.xml >/dev/null 2>&1 rm -f /Software/CentOS/$VERSION/updates/x86_64/repodata/repomd.xml >/dev/null 2>&1
wget -nH --cache=off --cut-dirs=4 -m -c -R gif,png,^index.html* -I ^/centos/$VERSION/updates/i386/ -P /Software/CentOS/$VERSION/updates/i386 http://mirror.centos.org/centos/$VERSION/updates/i386/ wget -nH --cache=off --cut-dirs=4 -m -c -R gif,png,^index.html* -I ^/centos/$VERSION/updates/x86_64/ -P /Software/CentOS/$VERSION/updates/x86_64 http://mirror.centos.org/centos/$VERSION/updates/x86_64/
-Ross
______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.
Ross S. W. Walker wrote on Wed, 10 Oct 2007 14:47:07 -0400:
Yes a lot of past versions are kept in the repo, but if you filter those out then it wouldn't be a "mirror" then?
Well, I'm just interested in the latest bits ;-)
Be careful with the "repomd.xml" file, delete it before starting to mirror to make sure it doesn't get out-of-sync, because, at least on my end, I find that wget via http doesn't compare time stamps just sizes/names and the sha1 checksums are always the same length.
Thanks for alerting me on this, I will take a close look, although I'm using FTP and it doesn't seem to happen there.
wget -nH --cache=off --cut-dirs=4 -m -c -R gif,png,^index.html* -I
^/centos/$VERSION/updates/i386/ -P /Software/CentOS/$VERSION/updates/i386 http://mirror.centos.org/centos/$VERSION/updates/i386/
I'm using FTP. When I started with HTTP wget immediately stopped with the robots.txt as this hadn't changed between retrievals. I'm using a very simple command line: wget --mirror ftp://....
Kai
Kai Schaetzl maillists@conactive.com wrote: I figured I try if I can mirror the base and updates repos locally. There's no tutorial for that, only one about creating your own repo of packages which is not the same. So, I just mirrored all the stuff with wget and changed the baseurl in the repo files and hoped that's enough. Works. So easy you don't need a tutorial.
Kai,
I only have a comment about the base mirror. Instead of using the internet to make a base mirror (not sure you did it that way), you can use the CentOS-Media.repo
This works best if you have the DVD ISO #mkdir /mnt/C564 #nano /etc/fstab ----------- add at end ---------------- /path-to/CentOS-5.0-x86_64-bin-DVD.iso /mnt/C564 iso9660 ro,loop,async 0 0 --------------- unsnip -------------
now edit /etc/yum.repos.d/CentOS-Media.repo and add to [c5-media] file:///mnt/C564
then you want YUM to ignore the [base] repo and use [c5-media] #yum search some-rpm --disablerepo=base --enablerepo=c5-media
now you have speed and still have all the default abilities of YUM
to find the RPM owner of [c5-media]: [tlviewer@hercules ~]$ rpm -qf /etc/yum.repos.d/CentOS-Media.repo centos-release-5-0.0.el5.centos.2
My repo for C5 (mpryor-c5.repo) at http://www.tlviewer.org/centos -- Mark
--------------------------------- Check out the hottest 2008 models today at Yahoo! Autos.
Mark pryor wrote on Wed, 10 Oct 2007 13:30:46 -0700 (PDT):
I only have a comment about the base mirror. Instead of using the internet to
make a base mirror (not sure you did it that way), you can use the CentOS-Media.repo
Ah, well, I remember having read about this, but admit I didn't think about it. In my case it wouldn't be usable, though. I want to setup several VMs and several physical machines and thought it would be stupid to update them all from a remote mirror. I set up the base repo by just copying the relevant stuff from the DVD to the harddisk of my Win2k3 server and then retrieved updates via wget from a mirror in Germany. Now I can install the VMs from my local mirror via FTP (installing from http won't work, don't know why) and update them via yum (using HTTP, it seems FTP doesn't work with yum) from the local mirror. The physical machines get a DVD drive attached, get a minimal install, DVD drive removed and then I can also update and add from the local mirror.
Kai
Kai Schaetzl wrote:
*But* I then realized that the updates directory contains *all* updates, not just the latest. Which means if I don't regularly check I may get old versions mirrored I don't want. It also means that I get a lot of unwanted files at the time I start to mirror. And I cannot delete old files as these would again be mirrored in. An obvious solution would be to check each day and tell wget (or whatever software I use) to ignore files older than 24 hours. Still, this means the initial download has to get them all and I have to delete all unwanted old files manually. Is there a better solution?
Use rsync. I keep a local copy of the updates, specific to my platform. I'm also very specific in WHAT I want locally as you'll see in the following script. I use the Stanford University's mirror.
:~> cat rsync.sh echo "Getting Centos 5.0 Updates..."; echo -e "*****************************\n"; rsync --progress --archive \ --partial --delete --delete-excluded \ --exclude centosplus/ \ --exclude fasttrack/ \ --exclude isos/ \ --exclude isos-dvd/ \ --exclude os/ \ --exclude updates/SRPMS/ \ --exclude updates/x86_64/ \ --exclude addons/SRPMS/ \ --exclude addons/x86_64/ \ --exclude extras/SRPMS/ \ --exclude extras/x86_64/ \ mirror.stanford.edu::mirrors/centos/5.0 /home/CentOS/;
Ashley M. Kirchner wrote on Wed, 10 Oct 2007 14:45:42 -0600:
Use rsync. I keep a local copy of the updates, specific to my platform. I'm also very specific in WHAT I want locally as you'll see in the following script.
Thanks, but this will still download and keep *all* updates for a platform. Besides, I'm mirroring on a Win2k server and I have wget for it but not rsync. rsync might be a better solution than wget, anyway, though, as I probably can tell it to only check for a specific time span while wget won't allow this.
Kai
Kai Schaetzl wrote:
Thanks, but this will still download and keep *all* updates for a platform.
If they exist on the mirror, then yes. However, anything that gets removed from the mirror you're using, will also get deleted from your local copy. That's the whole idea behind rsync's --delete flag. In my case, I don't really care if the CentOS mirrors don't delete old stuff, I just pull everything down. However, on the Fedora side, they do remove old packages, and rsync will automatically remove them from the local copy as well. I backup some 15 servers that way. They're all going to a single backup server where I keep up to 6 weeks worth of data (for some, 2 weeks on others) using rsync as a backup utility. Using hardlinks between backups allows me to keep that much information. And if a file gets removed from the source, it will get removed from the backup as well, but only for that run, not the previous ones.
As for not having rsync on Win2K, you can install CygWin on it and use rsync then. I have a Server2003 pulling 2 TiB of data every night from an old Win2K server using rsync.
on 10/11/2007 7:16 AM Ashley M. Kirchner spake the following:
Kai Schaetzl wrote:
Thanks, but this will still download and keep *all* updates for a platform.
If they exist on the mirror, then yes. However, anything that gets removed from the mirror you're using, will also get deleted from your local copy. That's the whole idea behind rsync's --delete flag. In my case, I don't really care if the CentOS mirrors don't delete old stuff, I just pull everything down. However, on the Fedora side, they do remove old packages, and rsync will automatically remove them from the local copy as well. I backup some 15 servers that way. They're all going to a single backup server where I keep up to 6 weeks worth of data (for some, 2 weeks on others) using rsync as a backup utility. Using hardlinks between backups allows me to keep that much information. And if a file gets removed from the source, it will get removed from the backup as well, but only for that run, not the previous ones.
As for not having rsync on Win2K, you can install CygWin on it and use rsync then. I have a Server2003 pulling 2 TiB of data every night from an old Win2K server using rsync.
With the space crunch on the CentOS mirrors, I don't know why they don't just have the latest files in the updates mirrors and move all the older stuff to vault. If someone wants an older release of a file, you need to get it directly anyway. It wouldn't save a ton of space, but it would save some.
Scott Silva wrote:
With the space crunch on the CentOS mirrors, I don't know why they don't just have the latest files in the updates mirrors and move all the older stuff to vault. If someone wants an older release of a file, you need to get it directly anyway. It wouldn't save a ton of space, but it would save some.
I couldn't agree more!
Scott Silva wrote:
With the space crunch on the CentOS mirrors, I don't know why they don't just have the latest files in the updates mirrors and move all the older stuff to vault. If someone wants an older release of a file, you need to get it directly anyway. It wouldn't save a ton of space, but it would save some.
um, centos/(vers)/updates/(arch)/RPMS generally has just the latest RPM for each updated rpm, except the kernels. I see very little redundancy there.
so, I'm not quite sure what you're referring to.
John R Pierce wrote:
um, centos/(vers)/updates/(arch)/RPMS generally has just the latest RPM for each updated rpm, except the kernels. I see very little redundancy there.
That's not completely true:
tomcat5-5.5.23-0jpp.1.0.3.el5.i386.rpm tomcat5-5.5.23-0jpp.1.0.4.el5.i386.rpm tomcat5-5.5.23-0jpp.3.0.2.el5.i386.rpm
Just as one example.
Cheers,
Ralph
Scott Silva wrote:
With the space crunch on the CentOS mirrors, I don't know why they don't just have the latest files in the updates mirrors and move all the older stuff to vault. If someone wants an older release of a file, you need to get it directly anyway. It wouldn't save a ton of space, but it would save some.
Uhhh, I don't experience that when I run my rsync script. It actually does delete stuff that's no longer on the mirror. So, somewhere, someone is cleaning out the mirrors of old stuff. Now, an argument for how often this happens might be more like it, but it does happen.
Ashley M. Kirchner wrote on Fri, 12 Oct 2007 08:59:53 -0600:
Uhhh, I don't experience that when I run my rsync script. It actually does delete stuff that's no longer on the mirror.
Yeah, but the repo seems to keep a lot ... I downloaded 5 or more versions of the same rpm of some software (for instance all the kernel packages, openoffice stuff in three versions) because there were two months or so between my first retrieval and the second (as I didn't have time to continue earlier)
So,
somewhere, someone is cleaning out the mirrors of old stuff. Now, an argument for how often this happens might be more like it, but it does happen.
Ah, well, so the CentOS 5 updates repo hasn't been "cleaned" yet? In that case they didn't clean it since starting it. From looking at the CentOS 4 repo's it looks like they don't clean, though. Repos get "automatically" cleaned when moving over from x.1 to x.2 etc. So, when 5.1 arrives the repo will have been cleared by then and start anew.
Don't get me wrong, I'm not complaining that they keep all those files. There may be good reasons to do so, I don't know. I merely wanted to see if I can do something from my side to avoid getting it all. Apparently, I can avoid it when regularly downloading by specifying a timelimit, but I can't avoid getting it all when I start a mirror (unless I use a filelist).
Kai
Kai Schaetzl wrote:
Apparently, I can avoid it when regularly downloading by specifying a timelimit, but I can't avoid getting it all when I start a mirror (unless I use a filelist).
That wouldn't be a mirror then, would it? :) I suppose, if you're really wanting to, one thing you could attempt, is what you suggested and maintain a filelist of what you want (or don't want.) Another possibility might be to roll your own script to run rsync in a dry-run mode (-n) and filter the output, then go back and grab what you want based on that. The dry-run mode will simply tell you what rsync would transfer without actually doing it. So you can then take the result, parse it, figureout by version numbers what you want, and then fetch it all, again through rsync.
But honestly, at that point, I just transfer the whole thing and not worry about it. It's not like it takes an extremely large amount of space, at least not to me:
:~> du -h --max-depth=2 CentOS/ 110M CentOS/5.0/extras 3.4G CentOS/5.0/updates 32K CentOS/5.0/addons 3.5G CentOS/5.0 3.5G CentOS/
(I exclude everything else from my rsync transfer, 64bit, isos, etc., etc.)
Kai Schaetzl wrote:
Ashley M. Kirchner wrote on Fri, 12 Oct 2007 08:59:53 -0600:
Uhhh, I don't experience that when I run my rsync script. It actually does delete stuff that's no longer on the mirror.
Yeah, but the repo seems to keep a lot ... I downloaded 5 or more versions of the same rpm of some software (for instance all the kernel packages, openoffice stuff in three versions) because there were two months or so between my first retrieval and the second (as I didn't have time to continue earlier)
So,
somewhere, someone is cleaning out the mirrors of old stuff. Now, an argument for how often this happens might be more like it, but it does happen.
Ah, well, so the CentOS 5 updates repo hasn't been "cleaned" yet? In that case they didn't clean it since starting it. From looking at the CentOS 4 repo's it looks like they don't clean, though. Repos get "automatically" cleaned when moving over from x.1 to x.2 etc. So, when 5.1 arrives the repo will have been cleared by then and start anew.
Don't get me wrong, I'm not complaining that they keep all those files. There may be good reasons to do so, I don't know. I merely wanted to see if I can do something from my side to avoid getting it all. Apparently, I can avoid it when regularly downloading by specifying a timelimit, but I can't avoid getting it all when I start a mirror (unless I use a filelist).
Kai
Why don't you want to keep old updates? What if you need to roll back because of some bad bug introduced in the latest version? The you have to go back to get either the original or the update that came before the version you have. And also, don't say mirror if you don't want to have the same packages. Just say you want to create a new repository containing some of the packages from the centos mirror.
Kai Schaetzl ha scritto:
I figured I try if I can mirror the base and updates repos locally. There's no tutorial for that, only one about creating your own repo of packages which is not the same. So, I just mirrored all the stuff with wget and changed the baseurl in the repo files and hoped that's enough. Works. So easy you don't need a tutorial. *But* I then realized that the updates directory contains *all* updates, not just the latest. Which means if I don't regularly check I may get old versions mirrored I don't want. It also means that I get a lot of unwanted files at the time I start to mirror. And I cannot delete old files as these would again be mirrored in. An obvious solution would be to check each day and tell wget (or whatever software I use) to ignore files older than 24 hours. Still, this means the initial download has to get them all and I have to delete all unwanted old files manually. Is there a better solution?
Kai
Have you tried mrepo?
Regards
Lorenzo Quatrini
Lorenzo wrote on Thu, 11 Oct 2007 09:38:21 +0200:
Have you tried mrepo?
How would this help? The main problem is to get rid of the "old" updates.
Kai
Kai Schaetzl ha scritto:
Lorenzo wrote on Thu, 11 Oct 2007 09:38:21 +0200:
Have you tried mrepo?
How would this help? The main problem is to get rid of the "old" updates.
Kai
You're right, I thought that mrepo would get rid of old updates by himself, but id doesn't. I am looking right now on different ways to get the same goal (save bandwidth, time and disk space); if I find someting I'll post on the mailing list.
Regards
Lorenzo
On Thu, 11 Oct 2007, Lorenzo Quatrini wrote:
Kai Schaetzl ha scritto:
Lorenzo wrote on Thu, 11 Oct 2007 09:38:21 +0200:
Have you tried mrepo?
How would this help? The main problem is to get rid of the "old" updates. Kai
Back in the old days, I used to use autoupdate .. and I believe that it would update the rpms you had and not keep the old ones. Autoupdate is available at http://www.mat.univie.ac.at/~gerald/ftp/autoupdate/. Check this out and let us know if this works for you.
Barry
On Thu, 11 Oct 2007, Lorenzo Quatrini wrote:
Kai Schaetzl ha scritto:
Lorenzo wrote on Thu, 11 Oct 2007 09:38:21 +0200:
Have you tried mrepo?
How would this help? The main problem is to get rid of the "old" updates. Kai
You're right, I thought that mrepo would get rid of old updates by himself, but id doesn't. I am looking right now on different ways to get the same goal (save bandwidth, time and disk space); if I find someting I'll post on the mailing list.
The problem is not getting rid of the old updates. I can write the functionality to do that. The problem is the fact that when you mirror something, you get a complete copy of the mirror. The copy-tool (whether it is lftp, rsync or something else) has no notion of 'only the recent package'. It has no notion of versions or packages.
So while we could remove the older updates, you would pull them in every time again and remove them again. I am certain this is what nobody wants :)
However, if you use RHN with the rhnget tool, it does have the notion of downloading only the recent updates. So the synchronisation with RHN allows you to specify to clean up the old updates because the tool works on RPMs. Not just on files.
If someone can come up with a smart way of handling this, mrepo can do it.