would anyone out there care to share their robots.txt experience using centos as a webserver and their robots.txt files?
i realize this is a somewhat simple exercise, yet i am sure there are both large and small hosters out there and possibly those that have high traffic modify their robots.txt files differently that others ???
please share if you can or care to please?
for years we have just did a * (allow all) and disallow on things like /cgi-bin
as examples of places to visit for those out or in the know...
http://en.wikipedia.org/wiki/Robots_exclusion_standard
http://www.google.com/robots.txt
and others...
quite frankly, there are many orgs out there that dont follow this anyways, right?
anyone?
tia
- rh
On Sat, Jan 16, 2010 at 10:18 PM, R-Elists lists07@abbacomm.net wrote:
would anyone out there care to share their robots.txt experience using centos as a webserver and their robots.txt files?
i realize this is a somewhat simple exercise, yet i am sure there are both large and small hosters out there and possibly those that have high traffic modify their robots.txt files differently that others ???
please share if you can or care to please?
for years we have just did a * (allow all) and disallow on things like /cgi-bin
as examples of places to visit for those out or in the know...
http://en.wikipedia.org/wiki/Robots_exclusion_standard
http://www.google.com/robots.txt
and others...
quite frankly, there are many orgs out there that dont follow this anyways, right?
Right http://blogs.perl.org/users/cpan_testers/2010/01/msnbot-must-die.html
anyone?
tia
- rh
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Sat, 2010-01-16 at 14:18 -0800, R-Elists wrote:
quite frankly, there are many orgs out there that dont follow this anyways,right?
Since robots.txt is a "suggestion" and .htaccess is actually enforced, I use a simple robots.txt like this:
User-agent: * Disallow:
and put the bad guys into .htaccess.
Add User-agent: Slurp
Crawl-delay: 86400 to stop misbehaving Yahoo bots. Slurp is often misbehaving, but it at least follows these rules. Something you can't say of Googlebot, for instance.
Kai
The best way is to remove it from your directory from the google webmaster tools. Also some bots don't listen so additionally to robots.txt use the webmaster central.
James
On Sun, Jan 17, 2010 at 9:31 AM, Kai Schaetzl maillists@conactive.comwrote:
Add User-agent: Slurp
Crawl-delay: 86400 to stop misbehaving Yahoo bots. Slurp is often misbehaving, but it at least follows these rules. Something you can't say of Googlebot, for instance.
Kai
-- Get your web at Conactive Internet Services: http://www.conactive.com
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos