How do I block ChatGPT training without hurting SEO?

Block GPTBot (OpenAI's training crawler) in your robots.txt. GPTBot is separate from OAI-SearchBot (which handles ChatGPT search indexing) and from Googlebot. Blocking GPTBot opts your content out of OpenAI model training without affecting your Google Search visibility or your presence in ChatGPT search answers.

What is the difference between GPTBot and OAI-SearchBot?

GPTBot is OpenAI's training crawler — it collects data for building and improving GPT language models. Blocking it opts out of model training. OAI-SearchBot is OpenAI's search indexing crawler — it indexes content for ChatGPT's web search and answer features. Blocking it reduces your presence in ChatGPT search results. They serve entirely different purposes and can be blocked independently.

What is the difference between ClaudeBot and Claude-SearchBot?

ClaudeBot is Anthropic's training crawler — it collects data for Claude model training. Claude-SearchBot is Anthropic's search crawler — it indexes content for Claude's search and answer features. Blocking ClaudeBot opts out of training without affecting Claude search visibility. Blocking Claude-SearchBot reduces your presence in Claude's answers without affecting training opt-out.

Does robots.txt block AI bots completely?

robots.txt is a declaration that well-behaved crawlers are expected to respect. It is advisory, not enforced. Non-compliant bots and scrapers can ignore it entirely. User-triggered fetchers (ChatGPT-User, Claude-User, Perplexity-User) may not consistently check robots.txt before fetching in response to a human user action. For stronger enforcement, use Cloudflare AI Crawl Control, WAF rules, or server-level user-agent filtering.

Can I stay in Google Search while opting out of AI training?

Yes. Block Google-Extended to opt out of certain Google AI training uses. Keep Googlebot allowed. These are separate tokens with separate effects. Blocking Google-Extended does not remove your content from Google Search or affect your ranking.

What happens if I block PerplexityBot?

Blocking PerplexityBot reduces or removes your content from Perplexity AI's search answers and citations. It has no effect on traditional search engines or other AI platforms. If you want to opt out of Perplexity's content use while remaining in traditional search, block PerplexityBot and keep Googlebot allowed.

What is CCBot and why do people block it?

CCBot is Common Crawl's web crawler. Common Crawl builds large public web datasets that are widely used by AI labs, academic researchers, and open-source model projects as training data. Many publishers block CCBot to prevent their content from being included in new Common Crawl snapshots. Blocking CCBot prevents future collection but does not remove content already in historical snapshots, which are already archived and redistributed.

Do user-triggered AI fetchers follow robots.txt?

Inconsistently. ChatGPT-User, Claude-User, and Perplexity-User fire when a human user explicitly requests a URL be fetched in an AI session. Whether these fetchers check robots.txt before making the request depends on the implementation and may vary. For reliable enforcement against user-triggered fetches, complement robots.txt with Cloudflare AI Crawl Control or WAF custom rules that match the user-agent string.

Should I use Cloudflare AI Crawl Control instead of only robots.txt?

Use both. robots.txt handles well-behaved crawlers and is the expected standard declaration for crawler policy. Cloudflare AI Crawl Control provides verified-bot enforcement at the network edge — it is stronger, more reliable, and especially effective for user-triggered fetchers that may not check robots.txt. Cloudflare also lets you track and log robots.txt violations, giving you visibility into crawler behavior on your site.

All tools

AI Crawler Rules Builder & robots.txt Generator

Q: What is the difference between Googlebot and Google-Extended?

Googlebot is Google's primary web crawler for indexing content in Google Search. Blocking it removes your content from Google Search — almost always the wrong action. Google-Extended is a separate robots.txt control token for opting out of certain Google AI training and model grounding uses. Blocking Google-Extended does not affect Google Search indexing, AI Overviews, or Gemini answers. They are completely independent and must be addressed separately.

Generate SEO-safe robots.txt rules for AI crawlers, search bots, and archive bots. Block training bots without accidentally blocking search visibility.

Choose a strategy

SEO: Full traditional search visibilityAI Search: AI answer and search visibility preservedTraining: Opted out of AI model training

Traditional search

AI search visible

Training blocked

Archive blocked

User agents blocked

Search Engine

— blocking impacts SEO

GooglebotGooglebotSearch Engine

Google's primary web crawler for indexing content in Google Search.

Allow

AI Search / AI Answers

— blocking impacts AI visibility

OAI-SearchBotOAI-SearchBotAI Search / AI Answers

OpenAI's crawler for indexing content for ChatGPT search features.

Allow

Claude-SearchBotClaude-SearchBotAI Search / AI Answers

Anthropic's crawler for indexing content for Claude's web search and answer features.

Allow

PerplexityBotPerplexityBotAI Search / AI Answers

Perplexity AI's crawler for indexing content to use in Perplexity answers and search.

Allow

AI Training

— safe to block

Google-ExtendedGoogle-ExtendedAI Training

A separate robots.txt control token for certain Google AI training and Bard/Gemini grounding uses, independent of Google Search.

Block

GPTBotGPTBotAI Training

OpenAI's crawler for collecting training data for GPT models.

Block

ClaudeBotClaudeBotAI Training

Anthropic's crawler for collecting training and model grounding data for Claude.

Block

User-Triggered Fetch

— advisory only

ChatGPT-UserChatGPT-UserUser-Triggered Fetchadvisory only

User-triggered fetcher used when a ChatGPT user requests a live URL to be fetched.

Allow

Claude-UserClaude-UserUser-Triggered Fetchadvisory only

User-triggered fetcher used when a Claude user causes a live URL to be fetched in a session.

Allow

Perplexity-UserPerplexity-UserUser-Triggered Fetchadvisory only

User-triggered fetcher used when a Perplexity user causes a live URL fetch.

Allow

Archive / Dataset

— safe to block

CCBot (Common Crawl)CCBotArchive / Dataset

Common Crawl's open web crawler that builds public datasets used by many AI labs for training.

Block

Advanced settings (paths, wildcard group, merge mode)

Block scope

Sitemap URL (optional)

Add section comments

Adds explanatory comments to the generated robots.txt

Add User-agent: * group

Adds a wildcard group covering unlisted crawlers

Existing robots.txt merge

Paste your current robots.txt to append or replace

robots.txt

# AI Training
User-agent: Google-Extended
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# Archive / Dataset
User-agent: CCBot
Disallow: /

# All other crawlers not listed above are allowed by default

Minimal AI section only (paste into existing robots.txt)

# AI Training
User-agent: Google-Extended
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# Archive / Dataset
User-agent: CCBot
Disallow: /

This tool runs entirely in your browser. No live robots.txt files are fetched, no URLs are visited, and no configuration is sent to a server. All output is generated locally from the registry of documented crawlers.

Overview

A few years ago, managing a robots.txt file was straightforward: allow Googlebot, maybe allow Bingbot, and optionally block a handful of known scrapers. Today, the decision is far more nuanced. Search engines, AI answer engines, AI training pipelines, user-triggered AI fetchers, and open-web archive crawlers all have distinct user-agents — and blocking the wrong one can silently remove your content from search results, AI citations, or both.

The most common mistake is treating "block AI bots" as a single action. In practice, there are at least four separate categories of crawler with AI-related user-agent tokens: traditional search (Googlebot), AI search/answer indexing (OAI-SearchBot, PerplexityBot, Claude-SearchBot), AI model training (GPTBot, ClaudeBot, Google-Extended, CCBot), and user-triggered retrieval (ChatGPT-User, Claude-User, Perplexity-User). Each has different implications for SEO, discoverability, and content governance.

Beyond robots.txt, enforcement is not uniform. Well-behaved crawlers respect Disallow directives. Some crawlers — particularly user-triggered ones — may not check robots.txt consistently before fetching a live URL on behalf of a human user. For those cases, Cloudflare AI Crawl Control or WAF rules provide a stronger enforcement layer. This tool helps you build the right robots.txt policy as a starting point, then explains where to layer on additional enforcement if needed.

Use cases

When to use it

Blocking AI training crawlers while preserving search visibilityuse the 'Search + No Training' preset to block GPTBot, ClaudeBot, Google-Extended, and CCBot while keeping Googlebot and AI search bots active.
Deciding whether AI answer engines should access your contentOAI-SearchBot, Claude-SearchBot, and PerplexityBot are the crawlers that feed AI search and answer experiences. The tool lets you allow or block each independently.
Understanding the difference between Googlebot and Google-ExtendedGooglebot controls Google Search indexing. Google-Extended is a separate robots.txt token for certain AI training and grounding uses. They are completely independent — blocking Google-Extended does not affect Google Search.
Generating clean robots.txt for deploymentcopy or download the generated robots.txt. Use the minimal AI section to append just the AI rules to an existing file without replacing your current configuration.
Understanding Cloudflare AI Crawl Control for stronger enforcementthe Deployment tab generates Cloudflare guidance and optional Nginx/Apache snippets for cases where robots.txt alone may not be sufficient.
Configuring path-level blocking for mixed-content sitesuse the custom path option to block specific folders (e.g. /private-docs/) rather than the whole site, which is useful for documentation sites and media platforms.

When it's not enough

Using this to block private or sensitive contentrobots.txt is a declaration to well-behaved crawlers. It cannot prevent non-compliant bots, scrapers, or direct URL requests. Do not use it as a security measure for private content.
Blocking Googlebot when the goal was only AI training opt-outblocking Googlebot removes your content from Google Search. If your goal is to opt out of Google AI training only, block Google-Extended instead. They are completely separate tokens.
Expecting user-triggered bots to reliably respect robots.txtChatGPT-User, Claude-User, and Perplexity-User fire in response to human requests and may not check robots.txt before fetching. For these cases, WAF or Cloudflare enforcement is more reliable.

How to use it

1
Choose a preset or custom strategy
'Search + No Training' is the recommended starting point for most publishers. It keeps traditional SEO and AI answer visibility intact while opting out of training and archive collection.
2
Review which crawlers are allowed and blocked
expand any bot row to see its purpose, operator, and the effect of blocking it. Toggle individual bots to fine-tune the policy beyond the preset defaults.
3
Check warnings about SEO and discoverability consequences
the Warnings tab surfaces critical issues (like Googlebot being accidentally blocked) alongside informational notes about advisory-only robots.txt enforcement.
4
Add your sitemap URL in Advanced Settings
the Sitemap: line in robots.txt helps all crawlers discover your full content. Add your sitemap URL in the Advanced Settings panel.
5
Copy or download the robots.txt
publish the file at https://yourdomain.com/robots.txt. Note that subdomains need their own robots.txt files.
6
Add Cloudflare AI Crawl Control for stronger enforcement
for user-triggered bots and non-compliant crawlers, Cloudflare AI Crawl Control provides verified-bot blocking at the edge — stronger and more reliable than robots.txt alone.

Common errors and fixes

Blocking Googlebot when the goal was to block AI training

Googlebot controls Google Search indexing. If you want to opt out of Google AI training, block Google-Extended instead. These are completely separate tokens. Disabling Google-Extended does not affect your Google Search presence.

Assuming Google-Extended controls AI Overviews or Gemini answers

Google-Extended controls certain AI training and model grounding uses but does not currently control whether your content appears in AI Overviews or Gemini search results. Its scope is specific to training pipelines as documented by Google.

Thinking GPTBot and OAI-SearchBot do the same thing

GPTBot is a training crawler — blocking it opts your content out of OpenAI model training. OAI-SearchBot is a search/discoverability crawler — blocking it reduces your content's presence in ChatGPT search answers. They are independent and serve different purposes.

Blocking all AI bots with User-agent: * Disallow: / and then wondering why SEO disappeared

A wildcard Disallow: / blocks all crawlers not otherwise explicitly listed, including Googlebot. Always add an explicit User-agent: Googlebot / Allow: / section before deploying a wildcard block if you need search visibility.

Relying on robots.txt alone to block user-triggered fetchers

ChatGPT-User, Claude-User, and Perplexity-User fire in response to human user actions and may not consistently check robots.txt before fetching. For reliable enforcement, use Cloudflare AI Crawl Control, WAF custom rules, or server-side user-agent matching.

Forgetting the Sitemap: line

Adding Sitemap: https://example.com/sitemap.xml to robots.txt helps all crawlers discover your full content index. This is a low-effort improvement that benefits both traditional search and AI search crawlers.

Treating robots.txt as a privacy control

robots.txt is a public declaration readable by anyone. Non-compliant bots can ignore it entirely. For private content, use authentication, access controls, and server-level restrictions.

Frequently asked questions

llms.txt Generator & Checker URL Indexability Inspector Meta Tag & SERP Previewer HTTP Headers Checker Cache-Control Analyzer & CDN Config Builder Robots.txt Checker Hreflang Cluster Validator All developer tools

How do I block AI training bots without hurting my SEO?

The key is treating AI training crawlers and search crawlers as completely separate. The safe approach for most publishers is to explicitly block training-specific tokens —GPTBot,ClaudeBot,Google-Extended, and CCBot — while keeping Googlebot and traditional search crawlers allowed. AI search bots (OAI-SearchBot, PerplexityBot, Claude-SearchBot) are a separate decision: blocking them reduces your presence in AI answer engines but does not affect traditional search rankings.

The "Search + No Training" preset in this tool applies exactly this strategy and is the recommended starting point for most publishers.

What is the difference between Googlebot and Google-Extended?

Googlebot is Google's primary web crawling user-agent. It indexes content for Google Search. Blocking Googlebot removes your pages from Google Search results — almost never the right choice for sites that want search visibility.

Google-Extended is a separate robots.txt control token — not a crawling user-agent in the traditional HTTP sense. It lets publishers opt out of certain Google AI training and model grounding uses without affecting Googlebot or Google Search. Blocking Google-Extended does not affect your Google Search ranking, visibility, or appearance in AI Overviews. The two tokens are completely independent and must be addressed separately.

GPTBot vs OAI-SearchBot vs ChatGPT-User

Token	Category	Purpose	Effect of blocking
GPTBot	AI Training	Collects data for OpenAI model training	Opts out of OpenAI model training — no effect on ChatGPT search
OAI-SearchBot	AI Search	Indexes content for ChatGPT search and answer features	Reduces/removes content from ChatGPT search answers — no effect on training
ChatGPT-User	User-Triggered Fetch	Fetches URLs when a human user requests them in ChatGPT	May prevent ChatGPT from fetching URLs in user sessions — advisory only

robots.txt for AI crawler control: what it can and cannot do

✓Good for declaring your crawling preferences to well-behaved bots
✓Effective at opting out of training and archiving by compliant crawlers
✗Not a security control — cannot prevent non-compliant bots or scrapers
✗Not reliable for private content — any bot can read and ignore the file
~Inconsistently enforced for user-triggered fetchers (ChatGPT-User, etc.)
~WAF, Cloudflare AI Crawl Control, or server rules needed for stronger enforcement

AI crawler cheat sheet

Bot / Token	Category	Main purpose	Reason to allow	Reason to block	Consequence of blocking
Googlebot	Search Engine	Google Search indexing	Essential for search visibility	Almost never — staging environments only	Content removed from Google Search
Google-Extended	AI Training	Google AI training opt-out token	Allow Google AI grounding uses	Opt out of Google AI training	Opted out of certain Google AI training — no SEO impact
GPTBot	AI Training	OpenAI model training	Consent to OpenAI training data use	Opt out of OpenAI model training	No more training use — ChatGPT search unaffected
OAI-SearchBot	AI Search	ChatGPT search indexing	Appear in ChatGPT search answers	Block ChatGPT search presence	Reduced visibility in ChatGPT answers
ClaudeBot	AI Training	Anthropic model training	Consent to Anthropic training data use	Opt out of Anthropic training	No more training use — Claude search unaffected
Claude-SearchBot	AI Search	Claude answer engine indexing	Appear in Claude's search answers	Block Claude search presence	Reduced visibility in Claude answers
Claude-User	User-Triggered	User-initiated live URL fetch in Claude	Allow per-user content retrieval	Limit live retrieval in Claude sessions	Advisory — WAF needed for reliable enforcement
PerplexityBot	AI Search	Perplexity search indexing	Appear in Perplexity answers	Block Perplexity presence	Reduced visibility in Perplexity results
Perplexity-User	User-Triggered	User-initiated fetch in Perplexity	Allow per-user retrieval	Limit live retrieval in Perplexity	Advisory — WAF needed for reliable enforcement
CCBot	Archive/Dataset	Common Crawl dataset building	Inclusion in academic/research archives	Opt out of Common Crawl and downstream AI training	Excluded from future Common Crawl snapshots

Deployment guidance

Where robots.txt must live: the file must be at the root of your domain — https://example.com/robots.txt. Each subdomain requires its own robots.txt file. A robots.txt at example.com does not apply to docs.example.com.

Cloudflare AI Crawl Control: available in the Cloudflare dashboard under Security → Bots → AI Scrapers & Crawlers. This provides verified-bot enforcement at the edge — stronger and more reliable than robots.txt alone, and especially useful for user-triggered fetchers that may not check robots.txt. Cloudflare can also track and log robots.txt violations from listed crawlers.

# Minimal robots.txt with sitemap (copy template)
# Generated by CodeAva AI Crawler Rules Builder

# AI Training — block training, keep search
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# All other crawlers (including Googlebot) remain allowed

Sitemap: https://example.com/sitemap.xml

AI Crawler Rules Builder & robots.txt Generator

Overview

Use cases

How to use it

Common errors and fixes

Frequently asked questions

What is the difference between Googlebot and Google-Extended?

How do I block ChatGPT training without hurting SEO?

What is the difference between GPTBot and OAI-SearchBot?

What is the difference between ClaudeBot and Claude-SearchBot?

Does robots.txt block AI bots completely?

Can I stay in Google Search while opting out of AI training?

What happens if I block PerplexityBot?

What is CCBot and why do people block it?

Do user-triggered AI fetchers follow robots.txt?

Should I use Cloudflare AI Crawl Control instead of only robots.txt?

Related

How do I block AI training bots without hurting my SEO?

What is the difference between Googlebot and Google-Extended?

GPTBot vs OAI-SearchBot vs ChatGPT-User

robots.txt for AI crawler control: what it can and cannot do

AI crawler cheat sheet

Deployment guidance