Robots.txt Checker
Validate your robots.txt file, test crawl rules by bot, and catch technical SEO mistakes before they hurt indexing.
Read-only check. CodeAva fetches your robots.txt file to inspect it and does not modify your site. Only publicly accessible robots.txt files can be fetched. robots.txt rules affect crawling, not guaranteed indexing behavior.
Overview
The Robots.txt Checker validates your robots.txt file, parses its directives, and surfaces issues that can silently harm search crawlability. It checks for syntax problems, dangerous blocking rules, missing Sitemap directives, and common configuration mistakes that even experienced developers overlook.
robots.txt is a plain-text file that tells compliant web crawlers which paths they are and are not allowed to access. It controls crawling — not indexing. A page blocked in robots.txt cannot be crawled, but it can still appear in search results if it is linked from other pages. Understanding this distinction is important when deciding whether to use robots.txt, noindex tags, or both.
The tool supports two input modes: fetch by URL (the checker appends /robots.txt to the root domain and fetches it directly) and paste mode for reviewing a file before deploying. It also includes a URL access tester that lets you check whether a specific path would be allowed or blocked for a given crawler — entirely client-side, using the parsed rules already returned.
Use cases
When to use it
- Pre-launch reviewvalidate robots.txt before a site goes live to catch full-site blocks, missing Sitemap directives, or malformed syntax.
- Post-migration checkafter migrating a site, confirm that staging-era blocking rules have not been carried into production.
- Crawl issue diagnosiswhen pages are not appearing in search results, check whether they are blocked in robots.txt before investigating other causes.
- AI bot managementcheck how rules apply to GPTBot, CCBot, and other AI crawlers, which use different user-agent strings from Googlebot.
- URL access testingtest specific paths against each user-agent group to confirm that important pages are allowed and sensitive paths are correctly restricted.
When it's not enough
- Preventing indexingrobots.txt blocks crawling but does not prevent indexing. Use a noindex meta tag or HTTP header to keep pages out of search results.
- Protecting private contentrobots.txt is publicly readable and only respected by compliant bots. Use authentication to restrict access to sensitive content.
- Blocking non-compliant scrapersmalicious scrapers do not respect robots.txt. Rate limiting, authentication, and WAF rules are the appropriate tools for that.
How to use it
- 1
Choose fetch or paste mode
Use "Fetch URL" to enter your domain and let the checker retrieve /robots.txt automatically. Use "Paste Content" to review a file before deploying.
- 2
Run the check
Click Check Robots.txt. The tool fetches and parses the file, then returns issues grouped by severity, user-agent rules, and extracted Sitemap URLs.
- 3
Review critical issues first
A Disallow: / on the * user-agent is the most damaging configuration possible — it blocks all crawlers from your entire site. Fix critical issues before reviewing warnings.
- 4
Test URL access
In the URL Access Tester, enter a path (e.g. /admin/settings) and select a user-agent. The tester shows whether that path is allowed or blocked and which rule matched.
- 5
Verify Sitemap directives
Check the Sitemap Directives panel to confirm your sitemap URL is present and correctly formatted. Add one if missing.
Common errors and fixes
Disallow: / blocking entire site for all bots
Remove or replace 'Disallow: /' under 'User-agent: *'. If you need to block specific sections, target those paths: 'Disallow: /admin/'. A full-site block prevents indexing of every page.
CSS, JS, or asset paths are blocked
Search engines need to render your pages to understand them. Remove Disallow rules covering /css, /js, /static, /assets, /_next, or /wp-content paths. Block only paths where restricting crawler access is intentional.
Malformed lines or unknown directives
Each line must follow the format 'Directive: value'. Check for missing colons, extra spaces in directive names, or typos. Unknown directives are ignored by crawlers but may indicate an intent that was not implemented correctly.
No Sitemap directive
Add 'Sitemap: https://yourdomain.com/sitemap.xml' to help search engines find your content. This is separate from submitting in Search Console — both are useful.
Rules defined before any User-agent
Allow and Disallow rules must follow a User-agent directive. Move any orphaned rules into a proper User-agent group, or add 'User-agent: *' before them.
Sitemap URL is relative or malformed
Sitemap URLs must be absolute, starting with https:// or http://. Use 'Sitemap: https://yourdomain.com/sitemap.xml' — not '/sitemap.xml'.