Hello, I'm running an apache 2.2 webserver on centos 5.3. I'm seeing frequent requests for robots.txt and favicon.ico from the logs those files should be in the document root area. What are these files, is this something the rpm installs, or do i have to retrieve or generate them? Thanks. Dave.
favicon.ico: http://lmgtfy.com/?q=favicon.ico
robots.txt: http://lmgtfy.com/?q=robots.txt
On Fri, Aug 28, 2009 at 11:46 AM, Davedave.mehler@gmail.com wrote:
Hello, I'm running an apache 2.2 webserver on centos 5.3. I'm seeing frequent requests for robots.txt and favicon.ico from the logs those files should be in the document root area. What are these files, is this something the rpm installs, or do i have to retrieve or generate them? Thanks. Dave.
Robots.txt is a file that allows or denies robots from indexing or crawling the site if they behave as they should. Favicon.ico is an icon image that shows up in the address bar of a browser generally to the left of the uri. Neither are completely necessary and both are items you would create and store in the public html directory as you had noted.
Cheers, Chad
Dave wrote:
Hello, I'm running an apache 2.2 webserver on centos 5.3. I'm seeing frequent requests for robots.txt and favicon.ico from the logs those files should be in the document root area. What are these files, is this something the rpm installs, or do i have to retrieve or generate them? Thanks. Dave.
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Fri, 28 Aug 2009 08:54:02 -0700 Taproot webmaster@taproothosting.com wrote:
Robots.txt is a file that allows or denies robots from indexing or crawling the site if they behave as they should.
It's a common misconception. Robots.txt does NOT allow or deny... Robots.txt only SUGGESTs what they should crawl or not. It's up to the crawler to respect the robots.txt file.
The big ones like Google, Yahoo, Microsoft do follow the instruction of the robots.txt file, but many, especially the one harvesting emails, photos..., do not follow the instructions of the robots.txt.
At Fri, 28 Aug 2009 11:46:43 -0400 CentOS mailing list centos@centos.org wrote:
Hello, I'm running an apache 2.2 webserver on centos 5.3. I'm seeing frequent requests for robots.txt and favicon.ico from the logs those files should be in the document root area. What are these files, is this something the rpm installs, or do i have to retrieve or generate them?
Both files are *optional*.
robots.txt is a per-virtual host file that tells 'friendly' robots (eg googlebot or yahoo's slurp and what not) what you want them to spider or not spider in your web site. If robots.txt is missing, the spiders spider everything they have a link to. Google for this file -- there are lots of web pages that explain what this file should contain.
favicon.ico is something IE (and later FireFox, etc.) look for to put next to the URL in the location field and to save with the URL in the client's bookmarks. By default it just shows some default icon. This is mostly a vanity thing. You just create a some little 16x16 pixel version of your logo or service mark or something and bundle it into a .ico file and drop it at root of your web pages.
Thanks. Dave.
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Fri, Aug 28, 2009 at 10:46 AM, Davedave.mehler@gmail.com wrote:
Hello, I'm running an apache 2.2 webserver on centos 5.3. I'm seeing frequent requests for robots.txt and favicon.ico from the logs those files should be in the document root area. What are these files, is this something the rpm installs, or do i have to retrieve or generate them?
My experience is that if you do not have these files, they will generate errors and possibly use more bandwidth than if you had the files. I made one favicon.ico file with a tool on the web, from a photo (sorry, I don't remember the URL) and I got the other favicon.ico file somewhere on the web where they can be legally downloaded (also do not remember that URL). As previous posters pointed out, the robots.txt file will be obeyed by crawlers that follow the rules, others won't pay any attention to it, but if you don't have a robots.txt file I think you will see errors in your logs.