• RSS

    Subscribe to our feed

  • Categories

  • Archives

  • Recent Posts

  • Recent Comments

  • Latest tweets

  •  

0

Is your site in danger of being de-indexed? Validate your Robots.Txt file

In a WebProNews Newsletter article, David A. Utter wrote that we need to validate our robots.txt files. If we don’t, the Googlebot crawler may find a forgotten line in robots.txt that may cause it to de-index a site from the search engine.

This is according to the writer of the Sebastians-Pamphlets blog who said that Google confirmed recognizing experimental syntax like Noindex in the robots.txt file. That means that forgotten –and, until recently, ignored– statements in your robots.txt might change the crawler’s behavior all of a sudden, without notice.

Sebastian is unsure of which experimental crawler directives Google has implemented, but continues to give an example. A line like “Noindex: /” in your robots.txt will now de-index your complete Web site.

“Noindex:” is not defined in the Robots Exclusion Protocol from 1994, and not mentioned in Google’s official documents.

Sebastian recommended Google’s robots.txt analyzer, part of Google’s Webmaster Tools, and only using the Disallow, Allow, and Sitemaps crawler directives in the Googlebot section of robots.txt.

Note: for all those who don’t know what a Robots.txt file is and what it does.
Google defines a robots.txt file as a file that provides restrictions to search engine robots (known as “bots”) that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages.

You need a robots.txt file only if your site includes content that you don’t want search engines to index. If you want search engines to index everything in your site, you don’t need a robots.txt file (not even an empty one).

Personally, I like the idea of having a robots.txt file even if i want all the pages of the site indexed, simply because I don’t want my site stats to show 404 errors for a missing robots.txt file. But, that is just me…

Post to Twitter Post to Facebook