Can a page blocked by robots.txt still appear in Google?

Yes. Blocking a URL in robots.txt prevents Google from crawling that page, but Google can still discover the URL via links from other pages. If the URL is linked from crawled pages, Google may include it in search results as a URL-only entry without a title or snippet — because it has never fetched the content. A robots.txt block does not guarantee removal from search; if you want a page removed, use noindex (on a crawlable page) or the URL removal tool in Search Console.

What is the difference between an HTML canonical tag and an HTTP canonical header?

Both declare the preferred canonical URL for a page, but they are set in different places. The HTML rel="canonical" tag is in the HTML head and is part of the page markup. The HTTP Link header with rel="canonical" is set at the server or CDN response level. Both are valid and respected by Google. If they disagree, search engines receive a conflicting signal — align them to the same URL to avoid ambiguity.

Does a canonical tag guarantee Google will index that URL?

No. A rel="canonical" tag is a recommendation, not a directive. Google may choose to override it if the canonical URL is inaccessible, returns a non-200 status, contains substantially different content, or conflicts with other signals like redirects or hreflang. For strong canonicalization, combine a canonical tag with a 301 redirect from non-canonical URLs to the preferred version.

How many redirects are too many for a clean URL?

One redirect is usually fine. Two redirects in a chain are acceptable. Three or more hops start to waste crawl budget and dilute link signals that should consolidate on the final URL. Aim to collapse redirect chains to a single 301 from the old URL to the final canonical URL whenever possible.

What does hreflang actually do?

Hreflang tags tell search engines which version of a page targets which language or region. They help search engines serve the correct locale-specific version to users in different markets. Hreflang does not affect whether a page is indexed — it affects which version is shown in search results for a given region. A malformed hreflang set (missing return-links, duplicate language codes, or invalid href values) may cause search engines to partially or fully ignore the hreflang cluster.

Does this tool execute JavaScript like Googlebot?

No. This tool inspects raw HTTP responses and raw HTML source. Content that requires JavaScript execution to render will not appear in the body text or HTML signals sections. The render-risk hint flags pages where the raw HTML body is very short relative to the number of scripts, suggesting content may depend on client-side rendering — but this is a heuristic, not a rendering test.

Is the inspected URL stored or logged?

URLs are fetched for analysis only and are not stored as page archives. CodeAva does not log the full content of fetched pages. The tool does not require credentials, cookies, or private headers — only the response headers and raw HTML of the public URL are used to generate the report.

All tools

URL Indexability Inspector

Q: Which takes priority: X-Robots-Tag or meta robots?

Neither is universally superior — both are respected by search engines. When they conflict, the most restrictive directive effectively wins. An X-Robots-Tag noindex in the HTTP response header will prevent indexing even if the HTML meta robots tag says index. Importantly, if a page is blocked by robots.txt, a crawler may never fetch the page at all, making the HTML meta robots tag invisible.

Inspect live HTTP headers, redirect chains, robots directives, canonical tags, and hreflang signals in one technical SEO report.

Inspect URL

Try:

Live URL inspection is performed through a lightweight serverless fetch layer. CodeAva does not require credentials, cookies, or private headers. URLs are fetched for analysis only and are not stored as page archives.

Enter a public URL above to run the indexability inspection

Live URL inspection runs through a lightweight serverless fetch layer. CodeAva does not require credentials, cookies, or private headers. URLs are fetched for analysis only and are not stored as page archives. Private and internal IP ranges are blocked at the fetch layer.

Why URL indexability is harder to diagnose than it looks

Most indexability problems are not caused by a single issue. They are caused by conflicting technical signals — a page that returns 200 but has a noindex in its X-Robots-Tag header, a canonical that points to a different URL, or a path that robots.txt blocks — preventing the crawler from ever reading the page-level directives in the first place. Diagnosing these situations requires looking at HTTP headers, HTML head content, and robots.txt together, not in isolation.

Raw HTTP headers and raw HTML head content must be checked together because they interact. An X-Robots-Tag header is set at the server or CDN level and overrides anything in the HTML head. A canonical tag in the HTML head may conflict with a canonical declared in an HTTP Link header. And a robots.txt disallow rule prevents the crawler from fetching the page at all — meaning page-level signals in the HTML are invisible to that crawler until the robots.txt block is removed.

A live inspector saves time versus manually jumping between browser DevTools, View Source, a robots.txt file, and a URL inspection API. This tool fetches the live page server-side, follows the redirect chain, parses both HTTP headers and raw HTML, checks robots.txt for the inspected path, and returns all signals in one structured report. The score is CodeAva's heuristic summary — not a Google score — and should be treated as a technical starting point for investigation.

Which takes priority: X-Robots-Tag or Meta Robots tag?

Search engines respect both. Neither is universally superior — both are valid ways to declare indexing and following directives. When they conflict, the most restrictive directive effectively wins. A page with X-Robots-Tag: noindex in the HTTP response header and <meta name="robots" content="index"> in the HTML head will still be treated as noindex by Google, because the header takes effect regardless of the HTML content. Importantly: if a page is blocked in robots.txt, the crawler may never fetch the page to read the HTML meta robots tag at all. In that case, page-level directives are effectively invisible — blocked pages can still appear in search results if they are linked from other pages, but Google will only show a URL without a snippet.

Perfect indexable URL checklist

Element	Ideal state	Why it matters
Status code	200 OK	Only 200 responses are reliably indexed. 4xx and 5xx prevent indexing.
Redirect chain	0–1 hops	Chains longer than 2 hops waste crawl budget and slow signal consolidation.
Meta robots	index, follow (or absent)	noindex prevents the page appearing in search results.
X-Robots-Tag	Absent or index	Server-level noindex overrides HTML meta — often set accidentally by CDN or CMS config.
Canonical	Present and self-referencing	A self-referencing canonical helps consolidate signals. Missing or conflicting canonicals create ambiguity.
Hreflang	Consistent set with x-default	Malformed or missing return-links break the hreflang cluster and may lead to wrong-locale ranking.
robots.txt crawlability	Path not disallowed	Blocked paths may not be fetched — making all page-level signals invisible to that crawler.

What this tool helps with

Good uses

Diagnosing why a page is not indexingcheck all major technical signals in one report — status code, robots directives, canonical, and crawlability — without switching between tools.
Checking redirect chain efficiencysee every hop in the redirect chain with status codes to identify chains that can be collapsed to reduce crawl overhead.
Comparing HTML vs HTTP canonical signalsdetect conflicts between the rel=canonical in the HTML head and a canonical declared in an HTTP Link header, which are easy to miss in DevTools.
Catching hidden X-Robots-Tag noindex issuesidentify CDN-injected or server-level X-Robots-Tag noindex directives that override HTML intent and are invisible in browser source view.
Validating robots.txt crawlability before relying on page directivesconfirm the inspected path is not blocked by a wildcard disallow rule — because if it is blocked, page-level meta robots and canonical tags may never be read.
Checking hreflang setups for localised pagesextract all hreflang tags from the live HTML source, detect duplicate language codes, missing x-default, and malformed href values.

Limitations to know

JavaScript-rendered content inspectionthis tool inspects raw HTTP responses and raw HTML source. Content that requires JavaScript execution to render — including React, Next.js client components, and SPAs — will not appear in the body text analysis. The render-risk hint flags this pattern but does not confirm rendering failure.
Private or staging URL inspectionthe fetch layer blocks private IP ranges and internal hostnames. Staging environments behind VPNs or authentication cannot be inspected. The tool is designed for publicly accessible URLs only.
Confirming Google has indexed a specific URLthis tool shows what CodeAva's fetch layer sees right now. Google may have a cached version from an earlier crawl. Use Google Search Console URL Inspection for authoritative indexing status from Google's perspective.

How to run an indexability inspection

1
Enter a live public URL
Paste any publicly accessible URL. The tool supports http:// and https:// — missing protocols are normalised to https://. Private IP ranges and internal hostnames are blocked at the fetch layer.
2
Click Inspect and wait for the report
The serverless fetch layer follows any redirects, reads the final response headers and HTML, and checks robots.txt for the domain. Most inspections complete in 2–10 seconds depending on the target server.
3
Review the score and issue list
The score (0–100) is CodeAva's heuristic summary. Critical issues are most likely to prevent indexing. Warnings are risky but not certain blockers. Informational items are worth knowing but unlikely to block indexing on their own.
4
Review individual signal sections
Check the redirect chain for unnecessary hops, the robots matrix for conflicts between meta robots and X-Robots-Tag, the canonical comparison, robots.txt crawlability, and hreflang if present.
5
Fix the issue and re-test
After deploying a fix, re-run the inspection to confirm the signal has changed. Note that Google may take time to recrawl the page after a fix — use Google Search Console URL Inspection to request indexing after confirming the signal is correct.

Common indexability issues and how to fix them

X-Robots-Tag noindex set by CDN or CMS, not the page template

X-Robots-Tag is sent as an HTTP response header, not in the HTML. It can be injected by a CDN (Cloudflare, Fastly), CMS staging environment config, or a server middleware rule. Check your CDN configuration, web server headers, and CMS publishing settings — not just the page template. This tool's robots matrix section will flag an X-Robots-Tag noindex alongside any HTML meta robots content.

robots.txt blocks the path, making page-level signals invisible

If a page is blocked by robots.txt, some crawlers will not fetch it — and therefore will never read the meta robots tag, canonical, or hreflang in the HTML head. Fix the robots.txt rule first, then verify the page-level signals. A blocked page can still appear in search results as a URL-only listing if it is linked from other crawled pages.

HTML canonical and HTTP Link header canonical conflict

Some CMSs set a canonical in the HTML head while a server or CDN layer sets a different canonical in the HTTP Link header. When they conflict, search engines receive contradictory signals. Audit your CDN response headers, CMS output, and any middleware that injects Link headers — then align them to the same canonical URL.

Redirect chain longer than expected

A redirect chain is created when each redirect target is itself another redirect. Common causes: HTTP to HTTPS redirect, then www to non-www redirect, then a trailing-slash redirect — three hops before the final page. Collapse these into a single 301 that goes directly to the final canonical URL.

Hreflang tags present but x-default missing or incorrect return-links

A well-formed hreflang set requires every URL in the set to include a reciprocal hreflang pointing back to all other URLs in the set. Missing return-links or a missing x-default tag will cause search engines to partially or fully ignore the hreflang cluster. Fix the hreflang set to include all variants and an x-default.

Score is low but the page appears in search results

The CodeAva Indexability Score is a heuristic snapshot, not a Google score. Google may have indexed the page during a previous crawl when signals were cleaner, or it may be overriding a canonical hint based on its own signals. A low score means the current state of the live page has one or more signals that risk future deindexing or reduced visibility — it does not confirm the page was never indexed.

Frequently asked questions

Structured Data Validator & Gap Checker Open Graph & Social Preview Inspector AI Crawler Rules Builder & robots.txt Generator llms.txt Generator & Checker Cache-Control Analyzer & CDN Config Builder Hreflang Cluster Validator Canonical URL Builder & Checker Meta Tag & SERP Previewer HTTP Headers Checker Robots.txt Checker Schema.org JSON-LD Generator Website Audit All developer tools