[CentOS-docs] robots.txt

Thu Aug 6 12:31:32 UTC 2009
Michael A. Peters <mpeters at mac.com>

Marcus Moeller wrote:
> Hi all,
> I have again noticed that the wiki does not really show up in search
> results and wonder if it has any impact that robots.txt on
> wiki.centos.org is empty.
> Perhaps it should at least contain User-agent: * ?

There should be a sitemap.xml file and robots.txt should point to it.

sitemap.xml can be generated from the wiki database with something like 


function sitemapurl($document,$loc,$priority,$changefreq,$lastmod='') {
    $mapurl = $document->createElement('url');
    $maploc = $document->createElement('loc',$loc);
    if (strlen($lastmod > 0)) {
       $maplastmod = $document->createElement('lastmod',$lastmod);
    $mappri = $document->createElement('priority',$priority);
    $mapcha = $document->createElement('changefreq',$changefreq);

$smap = new DOMDocument("1.0","UTF-8");
$smap->preserveWhiteSpace = false;
$smap->formatOutput = true;

$urlset = $smap->createElement('urlset');

$sql = "SELECT wikipage,lastmod FROM blah WHERE blah";
$rs=$mdb2->query($sql); // I use pear::mdb2
while($row = $rs->fetchRow(MDB2_FETCHMODE_OBJECT)) {
    $loc = 'http://www.domain.tld/' . $row->wikipage;
    $url = sitemapurl($smap,$loc,'0.50','daily','$row->lastmod');

header('Content-type: application/xml');
print ($smap->saveXML());

I think that requires php 5.2.x but I'm fairly sure mod_python/mod_perl 
have similar functionality.

That's what I do, I call the file sitemap.php and make sitemap.xml point 
to it via mod_rewrite. It works very well, new pages are added right 
away, my sitemaps get slurped up and my search engine rankings are decent.