On Sat, 2009-01-03 at 21:17 +0100, Ralph Angenendt wrote: > JohnS wrote: > > > > On Sat, 2009-01-03 at 19:41 +0100, Ralph Angenendt wrote: > > > JohnS wrote: > > > > Hey Dag, does the Wiki even have a Site Map? If not that can greatly > > > > help with the google searches (has to be submitted to Google). > > > > > > http://wiki.centos.org/TitleIndex is there. No idea if that is what Google > > > accepts as a sitemap. > > > > No it has to be Submitted to Google. > > Yeah, I know that :) > > > http://wiki.centos.org/TitleIndex?action=titleindex This can used for > > the site index What needs to be done is append www.wiki.centos.org to > > each line. > > That's bad. Append or Prepend? In what form? You seem to have a bit of > knowledge, so let me bother you instead of searching myself >:) Ok in the pure text form of index "/TitleIndex" would need www.wiki.centos.org in front of it like so: "www.wiki.centos.org/TitleIndex" But that's the hard and old way keep reading there's a better solution on down. > Regarding the rest of your mail - I have to chew and digest that first. OK maybe this will clear a lot up for you: Google has a Python Script that will generate the Site Index for the Wiki. Also what is very important in doing so is the inclusion of the last modified date for the pages in question. Indexing of the Wiki is very simple to do so. The forums on the other hand is well not so simple because it will mostly have to be done via XML and use the Python XML:SAX Libary to Escape the URLs. Although it could probally be done in text form also. By just using Indexing you should not have to change the MoinMoin code base for anything whatsoever. Also the internal MoinMoin Search could really be improved upon. There's no sense in haveing a Title and Text Search (poor design). It has a little to Much "LIKE%" in the search string and not enough "LIKE" if you know what I mean. Both of those can be used in MSSQL and MySQL, and Python I do not know the syntax for it. Someone can translate those or your self should be able to. It pulls in everything /under_the_sun and it needs specifics. Example: What DAG pointed to and is chewing on... <title>AdditionalResources/HardwareList - CentOS Wiki</title> Wrong Way Needs to Be: <title>HardwareList - CentOS Wiki</title> Correct Way To do that still the code base has to be modified no matter how you look at it. The way around it??? Create a Site Map and Submit it to Google. Then create a "robots.txt" containing "nofollow" inside it or a "follow" on specified dates. If there is a major update to the Wiki as in using a different one then keep in mind of how it handles the Meta Name and Title Content on page creation. Very Important in Web Design. Further more you may inspect larger Incorporated Sites and see Meta Name id=121212. That number inturn gets tranlated to a index that get auto generated and usually gets put into an SQL Server of Type. Then submitted to a search engine. All of which can be done automatically. I suspect this can be done by a Cron Job using the Google Site Map Generator code Once Daily. In all honesty it should not matter about how a Project like CentOS caters to there users as long as there satisfied and the content delivered. What I mean is the project needs to take a step back and reconsider how content is distributed. Things like stability of the platform that is delivering it. I could give a lot of options on what to use. I suspect something more the lines of "The CentOS Content Delivery System" on the Java Platform. I suspect the sources would not be that hard to build from upstream. ??? References: http://www.google.com/support/webmasters/bin/answer.py?answer=34654&topic=13452 http://www.google.com/support/webmasters/bin/answer.py?answer=34575 http://code.google.com/p/sitemap-generators/