All tools

Robots.txt Checker

Validate your robots.txt file, test crawl rules by bot, and catch technical SEO mistakes before they hurt indexing.

Robots.txt URL
Try:
Enter a URL or paste content and click “Check Robots.txt”

Read-only check. CodeAva fetches your robots.txt file to inspect it and does not modify your site. Only publicly accessible robots.txt files can be fetched. robots.txt rules affect crawling, not guaranteed indexing behavior.

Overview

The Robots.txt Checker validates your robots.txt file, parses its directives, and surfaces issues that can silently harm search crawlability. It checks for syntax problems, dangerous blocking rules, missing Sitemap directives, and common configuration mistakes that even experienced developers overlook.

robots.txt is a plain-text file that tells compliant web crawlers which paths they are and are not allowed to access. It controls crawling — not indexing. A page blocked in robots.txt cannot be crawled, but it can still appear in search results if it is linked from other pages. Understanding this distinction is important when deciding whether to use robots.txt, noindex tags, or both.

The tool supports two input modes: fetch by URL (the checker appends /robots.txt to the root domain and fetches it directly) and paste mode for reviewing a file before deploying. It also includes a URL access tester that lets you check whether a specific path would be allowed or blocked for a given crawler — entirely client-side, using the parsed rules already returned.

Use cases

When to use it

  • Pre-launch reviewvalidate robots.txt before a site goes live to catch full-site blocks, missing Sitemap directives, or malformed syntax.
  • Post-migration checkafter migrating a site, confirm that staging-era blocking rules have not been carried into production.
  • Crawl issue diagnosiswhen pages are not appearing in search results, check whether they are blocked in robots.txt before investigating other causes.
  • AI bot managementcheck how rules apply to GPTBot, CCBot, and other AI crawlers, which use different user-agent strings from Googlebot.
  • URL access testingtest specific paths against each user-agent group to confirm that important pages are allowed and sensitive paths are correctly restricted.

When it's not enough

  • Preventing indexingrobots.txt blocks crawling but does not prevent indexing. Use a noindex meta tag or HTTP header to keep pages out of search results.
  • Protecting private contentrobots.txt is publicly readable and only respected by compliant bots. Use authentication to restrict access to sensitive content.
  • Blocking non-compliant scrapersmalicious scrapers do not respect robots.txt. Rate limiting, authentication, and WAF rules are the appropriate tools for that.

How to use it

  1. 1

    Choose fetch or paste mode

    Use "Fetch URL" to enter your domain and let the checker retrieve /robots.txt automatically. Use "Paste Content" to review a file before deploying.

  2. 2

    Run the check

    Click Check Robots.txt. The tool fetches and parses the file, then returns issues grouped by severity, user-agent rules, and extracted Sitemap URLs.

  3. 3

    Review critical issues first

    A Disallow: / on the * user-agent is the most damaging configuration possible — it blocks all crawlers from your entire site. Fix critical issues before reviewing warnings.

  4. 4

    Test URL access

    In the URL Access Tester, enter a path (e.g. /admin/settings) and select a user-agent. The tester shows whether that path is allowed or blocked and which rule matched.

  5. 5

    Verify Sitemap directives

    Check the Sitemap Directives panel to confirm your sitemap URL is present and correctly formatted. Add one if missing.

Common errors and fixes

Disallow: / blocking entire site for all bots

Remove or replace 'Disallow: /' under 'User-agent: *'. If you need to block specific sections, target those paths: 'Disallow: /admin/'. A full-site block prevents indexing of every page.

CSS, JS, or asset paths are blocked

Search engines need to render your pages to understand them. Remove Disallow rules covering /css, /js, /static, /assets, /_next, or /wp-content paths. Block only paths where restricting crawler access is intentional.

Malformed lines or unknown directives

Each line must follow the format 'Directive: value'. Check for missing colons, extra spaces in directive names, or typos. Unknown directives are ignored by crawlers but may indicate an intent that was not implemented correctly.

No Sitemap directive

Add 'Sitemap: https://yourdomain.com/sitemap.xml' to help search engines find your content. This is separate from submitting in Search Console — both are useful.

Rules defined before any User-agent

Allow and Disallow rules must follow a User-agent directive. Move any orphaned rules into a proper User-agent group, or add 'User-agent: *' before them.

Sitemap URL is relative or malformed

Sitemap URLs must be absolute, starting with https:// or http://. Use 'Sitemap: https://yourdomain.com/sitemap.xml' — not '/sitemap.xml'.

Frequently asked questions