[CentOS-docs] robots.txt
Michael A. Peters
mpeters at mac.com
Thu Aug 6 13:31:32 UTC 2009
Marcus Moeller wrote:
> Hi all,
>
> I have again noticed that the wiki does not really show up in search
> results and wonder if it has any impact that robots.txt on
> wiki.centos.org is empty.
>
> Perhaps it should at least contain User-agent: * ?
There should be a sitemap.xml file and robots.txt should point to it.
sitemap.xml can be generated from the wiki database with something like
DOMDocument:
<?php
require('db_connect.php');
function sitemapurl($document,$loc,$priority,$changefreq,$lastmod='') {
$mapurl = $document->createElement('url');
$maploc = $document->createElement('loc',$loc);
$mapurl->appendChild($maploc);
if (strlen($lastmod > 0)) {
$maplastmod = $document->createElement('lastmod',$lastmod);
$mapurl->appendChild($maplastmod);
}
$mappri = $document->createElement('priority',$priority);
$mapurl->appendChild($mappri);
$mapcha = $document->createElement('changefreq',$changefreq);
$mapurl->appendChild($mapcha);
return($mapurl);
}
$smap = new DOMDocument("1.0","UTF-8");
$smap->preserveWhiteSpace = false;
$smap->formatOutput = true;
$urlset = $smap->createElement('urlset');
$urlset->setAttribute('xmlns','http://www.sitemaps.org/schemas/sitemap/0.9');
$urlset->setAttribute('xmlns:xsi','http://www.w3.org/2001/XMLSchema-instance');
$urlset->setAttribute('xsi:schemaLocation','http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd');
$sql = "SELECT wikipage,lastmod FROM blah WHERE blah";
$rs=$mdb2->query($sql); // I use pear::mdb2
while($row = $rs->fetchRow(MDB2_FETCHMODE_OBJECT)) {
$loc = 'http://www.domain.tld/' . $row->wikipage;
$url = sitemapurl($smap,$loc,'0.50','daily','$row->lastmod');
$urlset->appendChild($url);
}
$smap->appendChild($urlset);
header('Content-type: application/xml');
print ($smap->saveXML());
?>
I think that requires php 5.2.x but I'm fairly sure mod_python/mod_perl
have similar functionality.
That's what I do, I call the file sitemap.php and make sitemap.xml point
to it via mod_rewrite. It works very well, new pages are added right
away, my sitemaps get slurped up and my search engine rankings are decent.
More information about the CentOS-docs
mailing list