Am Dienstag, den 25.08.2009, 23:41 +0200 schrieb Tim Verhoeven:
Hi again,
After being busy with some other stuff I had some time to look into this some more.
First there is the issue that wget does not follow urls mentioned in CSS. Well, some good news here. The current development version of wget (1.12) has received support for it. I've tested and it indeed works. I can probably create a RPM for it if people are interested in this.
I'd rather not go down that road - the import statements can be probably followed more or less easily with find + awk + wget using just the distro's wget. I guess maintaining such a script is simpler than maintaining an additional wget package.
So, once wget has run and see 2 more things that needs to be done.
First is to add the disclaimer to each page. Andreas, I've tried using your XSLT (using xsltproc) but it does not seem to work here. Its probably my total ignorance about XML and XSL and how to use it. So if you could show how exactly to apply it ?
Hmm... it has been quite a time, but after all it was just doing: mv $f $f.bak xsltproc -o $f redhat2centos.xsl $f.bak rm $f.bak for every html-file.
The hard part was to make it fast - xsltproc downloads every dtd from the web, so you usually want to set XML_CATALOG_FILES accordingly, have the DTDs ready and use --no-net for xsltproc. I'll need to write a few lines of script-code to automate the setup, because in XML it seems every path needs to be absolute...
Secondly all the Red Hat logos need to be removed/replaced. Looking at the deployment guide this looks to be a relatively short list. So it should not be that hard for the script to have list of files to replace after the mirroring that takes care of this.
Yes. I guess it would be rather easy to update the image sources to point at centos content using xsl and just have a list of files that must not exist in our mirrored content.
Do not hesitate to bug me. If in doubt, just CC me - I often read the lists rather infrequently...
Regards, Andreas