[CentOS-docs] robots.txt

Thu Aug 6 12:31:32 UTC 2009
Michael A. Peters <mpeters at mac.com>

Marcus Moeller wrote:
> Hi all,
> 
> I have again noticed that the wiki does not really show up in search
> results and wonder if it has any impact that robots.txt on
> wiki.centos.org is empty.
> 
> Perhaps it should at least contain User-agent: * ?

There should be a sitemap.xml file and robots.txt should point to it.

sitemap.xml can be generated from the wiki database with something like 
DOMDocument:

<?php
require('db_connect.php');

function sitemapurl($document,$loc,$priority,$changefreq,$lastmod='') {
    $mapurl = $document->createElement('url');
    $maploc = $document->createElement('loc',$loc);
    $mapurl->appendChild($maploc);
    if (strlen($lastmod > 0)) {
       $maplastmod = $document->createElement('lastmod',$lastmod);
       $mapurl->appendChild($maplastmod);
       }
    $mappri = $document->createElement('priority',$priority);
    $mapurl->appendChild($mappri);
    $mapcha = $document->createElement('changefreq',$changefreq);
    $mapurl->appendChild($mapcha);
    return($mapurl);
    }

$smap = new DOMDocument("1.0","UTF-8");
$smap->preserveWhiteSpace = false;
$smap->formatOutput = true;

$urlset = $smap->createElement('urlset');
$urlset->setAttribute('xmlns','http://www.sitemaps.org/schemas/sitemap/0.9');
$urlset->setAttribute('xmlns:xsi','http://www.w3.org/2001/XMLSchema-instance');
$urlset->setAttribute('xsi:schemaLocation','http://www.sitemaps.org/schemas/sitemap/0.9 
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd');

$sql = "SELECT wikipage,lastmod FROM blah WHERE blah";
$rs=$mdb2->query($sql); // I use pear::mdb2
while($row = $rs->fetchRow(MDB2_FETCHMODE_OBJECT)) {
    $loc = 'http://www.domain.tld/' . $row->wikipage;
    $url = sitemapurl($smap,$loc,'0.50','daily','$row->lastmod');
    $urlset->appendChild($url);
    }

$smap->appendChild($urlset);
header('Content-type: application/xml');
print ($smap->saveXML());
?>

I think that requires php 5.2.x but I'm fairly sure mod_python/mod_perl 
have similar functionality.

That's what I do, I call the file sitemap.php and make sitemap.xml point 
to it via mod_rewrite. It works very well, new pages are added right 
away, my sitemaps get slurped up and my search engine rankings are decent.