Hello,
Karanbir Singh told me that I might be able to help CentOS with the 5.3 documentation. He tell me the effort is something Tim Verhoeven has taken under his belt. Can somebody tell me how I might be able to contribute?
Some background: I'm a programmer for a living (C#, Java) but am also fairly knowledgeable about awk, sed, perl, a little C, etc.
Hope this helps...
-- Brad Potts "Comfort is the antidote to success" — Unknown
On Fri, Aug 7, 2009 at 6:30 PM, Brad Pottsbrad.potts@gmail.com wrote:
Karanbir Singh told me that I might be able to help CentOS with the 5.3 documentation. He tell me the effort is something Tim Verhoeven has taken under his belt. Can somebody tell me how I might be able to contribute?
Some background: I'm a programmer for a living (C#, Java) but am also fairly knowledgeable about awk, sed, perl, a little C, etc.
Hi Brad,
Thank you for the offer to help. What we need is a script that mirrors the documentation from upstream (for example : http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_...) to our site (http://www.centos.org/docs/5/html/5.2/Installation_Guide/) and in the process replace the header and footer with the disclaimer we are already using. Also images etc should be copied along.
Our guess is that with wget and some bash en sed magic this should not be that hard. Let us know if you think you can write such a thing.
Regards, Tim
Am Montag, den 10.08.2009, 10:51 +0200 schrieb Tim Verhoeven:
Hi Brad,
ok - I'll admit it... I'm not Brad.
Thank you for the offer to help. What we need is a script that mirrors the documentation from upstream (for example : http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_...) to our site (http://www.centos.org/docs/5/html/5.2/Installation_Guide/) and in the process replace the header and footer with the disclaimer we are already using. Also images etc should be copied along.
Our guess is that with wget and some bash en sed magic this should not be that hard. Let us know if you think you can write such a thing.
I just hacked it together using wget and xslt. It is still quite rough, but works. Downloading can be done with wget -r -nH -np http://whatever/base/url/ However, there are css files missing then (wget doesn't follow @import statements). After that you can just apply the attached XSLT to remove the header and add the disclaimer.
So far I only tested with the Deployment Guide. I wanted to know wether that kind of solution is ok for you. If it is, I can probably create a rather small script for mirroring and maintain it.
Regards, Andreas
PS: Brad - sorry for hacking that together, but it looked so simple and tempting that I coudln't resist.
On Wed, Aug 12, 2009 at 7:29 PM, Andreas Roggea.rogge@solvention.de wrote:
I just hacked it together using wget and xslt. It is still quite rough, but works. Downloading can be done with wget -r -nH -np http://whatever/base/url/ However, there are css files missing then (wget doesn't follow @import statements). After that you can just apply the attached XSLT to remove the header and add the disclaimer.
So far I only tested with the Deployment Guide. I wanted to know wether that kind of solution is ok for you. If it is, I can probably create a rather small script for mirroring and maintain it.
Hi again,
After being busy with some other stuff I had some time to look into this some more.
First there is the issue that wget does not follow urls mentioned in CSS. Well, some good news here. The current development version of wget (1.12) has received support for it. I've tested and it indeed works. I can probably create a RPM for it if people are interested in this.
So, once wget has run and see 2 more things that needs to be done.
First is to add the disclaimer to each page. Andreas, I've tried using your XSLT (using xsltproc) but it does not seem to work here. Its probably my total ignorance about XML and XSL and how to use it. So if you could show how exactly to apply it ?
Secondly all the Red Hat logos need to be removed/replaced. Looking at the deployment guide this looks to be a relatively short list. So it should not be that hard for the script to have list of files to replace after the mirroring that takes care of this.
So, if people have some time the coming days to have a go at this let us know. If not I will probably have a go at it soonish.
Thanks again for the help, Tim
On 08/26/2009 12:41 AM, Tim Verhoeven wrote:
On Wed, Aug 12, 2009 at 7:29 PM, Andreas Roggea.rogge@solvention.de wrote:
I just hacked it together using wget and xslt. It is still quite rough, but works. Downloading can be done with wget -r -nH -np http://whatever/base/url/ However, there are css files missing then (wget doesn't follow @import statements). After that you can just apply the attached XSLT to remove the header and add the disclaimer.
So far I only tested with the Deployment Guide. I wanted to know wether that kind of solution is ok for you. If it is, I can probably create a rather small script for mirroring and maintain it.
Hi again,
After being busy with some other stuff I had some time to look into this some more.
First there is the issue that wget does not follow urls mentioned in CSS. Well, some good news here. The current development version of wget (1.12) has received support for it. I've tested and it indeed works. I can probably create a RPM for it if people are interested in this.
I've tried to quickly hack a rpm, but it seems that build requires a newer autoconf (2.61) than the one from C5 (2.59). I've tried to also rebuild for C5 the autoconf-2.63 from F11, but in turns this one requires m4 >= 1.4.7. I'll try to continue one of these days, if no one does it before me. In case anyone wishes to have a head start, my src.rpm for wget is at http://wolfy.fedorapeople.org/wget-1.12.0-0.1.20090826hg.fc12.src.rpm
Tim Verhoeven wrote:
On Wed, Aug 12, 2009 at 7:29 PM, Andreas Roggea.rogge@solvention.de wrote:
I just hacked it together using wget and xslt. It is still quite rough, but works. Downloading can be done with wget -r -nH -np http://whatever/base/url/ However, there are css files missing then (wget doesn't follow @import statements). After that you can just apply the attached XSLT to remove the header and add the disclaimer.
So far I only tested with the Deployment Guide. I wanted to know wether that kind of solution is ok for you. If it is, I can probably create a rather small script for mirroring and maintain it.
Hi again,
After being busy with some other stuff I had some time to look into this some more.
First there is the issue that wget does not follow urls mentioned in CSS. Well, some good news here. The current development version of wget (1.12) has received support for it. I've tested and it indeed works. I can probably create a RPM for it if people are interested in this.
http://wolfy.fedorapeople.org/wget <---- SRPM and i386.el5.rpm x86_64.rpm available on request + 30 min work
Am Dienstag, den 25.08.2009, 23:41 +0200 schrieb Tim Verhoeven:
Hi again,
After being busy with some other stuff I had some time to look into this some more.
First there is the issue that wget does not follow urls mentioned in CSS. Well, some good news here. The current development version of wget (1.12) has received support for it. I've tested and it indeed works. I can probably create a RPM for it if people are interested in this.
I'd rather not go down that road - the import statements can be probably followed more or less easily with find + awk + wget using just the distro's wget. I guess maintaining such a script is simpler than maintaining an additional wget package.
So, once wget has run and see 2 more things that needs to be done.
First is to add the disclaimer to each page. Andreas, I've tried using your XSLT (using xsltproc) but it does not seem to work here. Its probably my total ignorance about XML and XSL and how to use it. So if you could show how exactly to apply it ?
Hmm... it has been quite a time, but after all it was just doing: mv $f $f.bak xsltproc -o $f redhat2centos.xsl $f.bak rm $f.bak for every html-file.
The hard part was to make it fast - xsltproc downloads every dtd from the web, so you usually want to set XML_CATALOG_FILES accordingly, have the DTDs ready and use --no-net for xsltproc. I'll need to write a few lines of script-code to automate the setup, because in XML it seems every path needs to be absolute...
Secondly all the Red Hat logos need to be removed/replaced. Looking at the deployment guide this looks to be a relatively short list. So it should not be that hard for the script to have list of files to replace after the mirroring that takes care of this.
Yes. I guess it would be rather easy to update the image sources to point at centos content using xsl and just have a list of files that must not exist in our mirrored content.
Do not hesitate to bug me. If in doubt, just CC me - I often read the lists rather infrequently...
Regards, Andreas
Andreas Rogge wrote:
Am Dienstag, den 25.08.2009, 23:41 +0200 schrieb Tim Verhoeven:
Hi again,
After being busy with some other stuff I had some time to look into this some more.
First there is the issue that wget does not follow urls mentioned in CSS. Well, some good news here. The current development version of wget (1.12) has received support for it. I've tested and it indeed works. I can probably create a RPM for it if people are interested in this.
I'd rather not go down that road - the import statements can be probably followed more or less easily with find + awk + wget using just the distro's wget. I guess maintaining such a script is simpler than maintaining an additional wget package.
As I've already said, the required wget is at http://wolfy.fedorapeople.org/wget/ (for almost a month).
Hi Andreas,
On 09/22/2009 08:51 AM, Andreas Rogge wrote:
After being busy with some other stuff I had some time to look into this some more.
Can this process be used for non English content as well ? I've not looked myself since there are clearly other people who have the bandwidth and are more able.
Yes. I guess it would be rather easy to update the image sources to point at centos content using xsl and just have a list of files that must not exist in our mirrored content.
We should do that as well.
Do not hesitate to bug me. If in doubt, just CC me - I often read the lists rather infrequently...
Keep in mind that private emails are a good way - maybe the best way - to kill community efforts :)
- KB
Am Dienstag, den 22.09.2009, 13:23 +0100 schrieb Karanbir Singh:
Hi Andreas,
On 09/22/2009 08:51 AM, Andreas Rogge wrote:
After being busy with some other stuff I had some time to look into this some more.
Can this process be used for non English content as well ? I've not looked myself since there are clearly other people who have the bandwidth and are more able.
I didn't test, but the XML-structure should be the same for every language, so the XSL should work for every language.
Keep in mind that private emails are a good way - maybe the best way - to kill community efforts :)
Indeed. However, I didn't ask for private mail, but asked to get a CC of the Mail to list if immediate response is desired.
On 09/23/2009 10:44 PM, Andreas Rogge wrote:
Can this process be used for non English content as well ? I've not looked myself since there are clearly other people who have the bandwidth and are more able.
I didn't test, but the XML-structure should be the same for every language, so the XSL should work for every language.
Would it be possible for you to look at this sometime - whenever you get the time, does not need to be right away.
If we can get an automateable setup like this in place, it would be good to get it plumbed into the
Indeed. However, I didn't ask for private mail, but asked to get a CC of the Mail to list if immediate response is desired.
ah! ok, CC'd - however, immediate response not desired, a response when you have the time, would be good.