Question 1

Does robots.txt prevent indexing?

Accepted Answer

No. Robots.txt controls crawling, not indexing. A page blocked by robots.txt cannot be crawled, but search engines may still index it if it is linked from other pages. To prevent a page from appearing in search results, use a noindex meta tag or HTTP header instead — and make sure the page is not blocked by robots.txt, otherwise the noindex directive cannot be read.

Question 2

What does Disallow: / do?

Accepted Answer

Disallow: / blocks the specified user-agent from crawling any path on your site. When applied to * (all bots), it prevents every crawler from accessing every page. This is one of the most damaging mistakes you can make in a robots.txt file — it does not prevent indexing but prevents crawlers from discovering or updating any content.

Question 3

Should I include sitemap URLs in robots.txt?

Accepted Answer

Yes. Adding a Sitemap: directive pointing to your XML sitemap helps search engines discover your content faster. Example: Sitemap: https://yourdomain.com/sitemap.xml. You can include multiple Sitemap directives if you have a sitemap index or separate sitemaps for different content types.

Question 4

Why is a URL blocked for one bot but not another?

Accepted Answer

Robots.txt rules apply per user-agent group. You can define different Allow and Disallow rules for Googlebot, Bingbot, GPTBot, and other crawlers, as well as a wildcard * group that applies to all unlisted bots. A URL can be blocked for one specific user-agent while being allowed for the wildcard group, or vice versa.

Question 5

Can robots.txt protect sensitive information?

Accepted Answer

No. Robots.txt is a public file — anyone can read it. Blocking a path in robots.txt only prevents compliant crawlers from following those paths; it does not restrict direct access by humans or non-compliant bots. Sensitive data must be protected by authentication, not by robots.txt rules.

Question 6

What is the difference between crawl blocking and noindex?

Accepted Answer

Crawl blocking (via robots.txt Disallow) prevents a bot from fetching a page. Noindex (via meta tag or HTTP header) allows the page to be fetched but instructs the search engine not to include it in search results. To remove a page from search results reliably, use noindex — and ensure the page is crawlable so the noindex instruction can actually be read.

Question 7

Should every site have a robots.txt file?

Accepted Answer

It is good practice to have one, even if it only contains a single Sitemap: directive or an explicit 'User-agent: * / Disallow:' (allow all). A missing robots.txt returns a 404, which is not harmful, but having one makes your crawl intentions explicit and provides a place to list sitemap URLs.

Question 8

Does Googlebot respect Crawl-delay?

Accepted Answer

No. Googlebot does not respect the Crawl-delay directive. To control Googlebot's crawl rate, use the crawl rate settings in Google Search Console. Other bots such as Bingbot do respect Crawl-delay, so the directive is useful for managing non-Google crawlers.

Robots.txt Checker

Overview

Use cases

How to use it

Common errors and fixes

Frequently asked questions