URL Indexability Inspector
Inspect live HTTP headers, redirect chains, robots directives, canonical tags, and hreflang signals in one technical SEO report.
Live URL inspection is performed through a lightweight serverless fetch layer. CodeAva does not require credentials, cookies, or private headers. URLs are fetched for analysis only and are not stored as page archives.
Live URL inspection runs through a lightweight serverless fetch layer. CodeAva does not require credentials, cookies, or private headers. URLs are fetched for analysis only and are not stored as page archives. Private and internal IP ranges are blocked at the fetch layer.
Why URL indexability is harder to diagnose than it looks
Most indexability problems are not caused by a single issue. They are caused by conflicting technical signals — a page that returns 200 but has a noindex in its X-Robots-Tag header, a canonical that points to a different URL, or a path that robots.txt blocks — preventing the crawler from ever reading the page-level directives in the first place. Diagnosing these situations requires looking at HTTP headers, HTML head content, and robots.txt together, not in isolation.
Raw HTTP headers and raw HTML head content must be checked together because they interact. An X-Robots-Tag header is set at the server or CDN level and overrides anything in the HTML head. A canonical tag in the HTML head may conflict with a canonical declared in an HTTP Link header. And a robots.txt disallow rule prevents the crawler from fetching the page at all — meaning page-level signals in the HTML are invisible to that crawler until the robots.txt block is removed.
A live inspector saves time versus manually jumping between browser DevTools, View Source, a robots.txt file, and a URL inspection API. This tool fetches the live page server-side, follows the redirect chain, parses both HTTP headers and raw HTML, checks robots.txt for the inspected path, and returns all signals in one structured report. The score is CodeAva's heuristic summary — not a Google score — and should be treated as a technical starting point for investigation.
Which takes priority: X-Robots-Tag or Meta Robots tag?
Search engines respect both. Neither is universally superior — both are valid ways to declare indexing and following directives. When they conflict, the most restrictive directive effectively wins. A page with X-Robots-Tag: noindex in the HTTP response header and <meta name="robots" content="index"> in the HTML head will still be treated as noindex by Google, because the header takes effect regardless of the HTML content. Importantly: if a page is blocked in robots.txt, the crawler may never fetch the page to read the HTML meta robots tag at all. In that case, page-level directives are effectively invisible — blocked pages can still appear in search results if they are linked from other pages, but Google will only show a URL without a snippet.
Perfect indexable URL checklist
| Element | Ideal state | Why it matters |
|---|---|---|
| Status code | 200 OK | Only 200 responses are reliably indexed. 4xx and 5xx prevent indexing. |
| Redirect chain | 0–1 hops | Chains longer than 2 hops waste crawl budget and slow signal consolidation. |
| Meta robots | index, follow (or absent) | noindex prevents the page appearing in search results. |
| X-Robots-Tag | Absent or index | Server-level noindex overrides HTML meta — often set accidentally by CDN or CMS config. |
| Canonical | Present and self-referencing | A self-referencing canonical helps consolidate signals. Missing or conflicting canonicals create ambiguity. |
| Hreflang | Consistent set with x-default | Malformed or missing return-links break the hreflang cluster and may lead to wrong-locale ranking. |
| robots.txt crawlability | Path not disallowed | Blocked paths may not be fetched — making all page-level signals invisible to that crawler. |
What this tool helps with
Good uses
- Diagnosing why a page is not indexingcheck all major technical signals in one report — status code, robots directives, canonical, and crawlability — without switching between tools.
- Checking redirect chain efficiencysee every hop in the redirect chain with status codes to identify chains that can be collapsed to reduce crawl overhead.
- Comparing HTML vs HTTP canonical signalsdetect conflicts between the rel=canonical in the HTML head and a canonical declared in an HTTP Link header, which are easy to miss in DevTools.
- Catching hidden X-Robots-Tag noindex issuesidentify CDN-injected or server-level X-Robots-Tag noindex directives that override HTML intent and are invisible in browser source view.
- Validating robots.txt crawlability before relying on page directivesconfirm the inspected path is not blocked by a wildcard disallow rule — because if it is blocked, page-level meta robots and canonical tags may never be read.
- Checking hreflang setups for localised pagesextract all hreflang tags from the live HTML source, detect duplicate language codes, missing x-default, and malformed href values.
Limitations to know
- JavaScript-rendered content inspectionthis tool inspects raw HTTP responses and raw HTML source. Content that requires JavaScript execution to render — including React, Next.js client components, and SPAs — will not appear in the body text analysis. The render-risk hint flags this pattern but does not confirm rendering failure.
- Private or staging URL inspectionthe fetch layer blocks private IP ranges and internal hostnames. Staging environments behind VPNs or authentication cannot be inspected. The tool is designed for publicly accessible URLs only.
- Confirming Google has indexed a specific URLthis tool shows what CodeAva's fetch layer sees right now. Google may have a cached version from an earlier crawl. Use Google Search Console URL Inspection for authoritative indexing status from Google's perspective.
How to run an indexability inspection
- 1
Enter a live public URL
Paste any publicly accessible URL. The tool supports http:// and https:// — missing protocols are normalised to https://. Private IP ranges and internal hostnames are blocked at the fetch layer.
- 2
Click Inspect and wait for the report
The serverless fetch layer follows any redirects, reads the final response headers and HTML, and checks robots.txt for the domain. Most inspections complete in 2–10 seconds depending on the target server.
- 3
Review the score and issue list
The score (0–100) is CodeAva's heuristic summary. Critical issues are most likely to prevent indexing. Warnings are risky but not certain blockers. Informational items are worth knowing but unlikely to block indexing on their own.
- 4
Review individual signal sections
Check the redirect chain for unnecessary hops, the robots matrix for conflicts between meta robots and X-Robots-Tag, the canonical comparison, robots.txt crawlability, and hreflang if present.
- 5
Fix the issue and re-test
After deploying a fix, re-run the inspection to confirm the signal has changed. Note that Google may take time to recrawl the page after a fix — use Google Search Console URL Inspection to request indexing after confirming the signal is correct.
Common indexability issues and how to fix them
X-Robots-Tag noindex set by CDN or CMS, not the page template
X-Robots-Tag is sent as an HTTP response header, not in the HTML. It can be injected by a CDN (Cloudflare, Fastly), CMS staging environment config, or a server middleware rule. Check your CDN configuration, web server headers, and CMS publishing settings — not just the page template. This tool's robots matrix section will flag an X-Robots-Tag noindex alongside any HTML meta robots content.
robots.txt blocks the path, making page-level signals invisible
If a page is blocked by robots.txt, some crawlers will not fetch it — and therefore will never read the meta robots tag, canonical, or hreflang in the HTML head. Fix the robots.txt rule first, then verify the page-level signals. A blocked page can still appear in search results as a URL-only listing if it is linked from other crawled pages.
HTML canonical and HTTP Link header canonical conflict
Some CMSs set a canonical in the HTML head while a server or CDN layer sets a different canonical in the HTTP Link header. When they conflict, search engines receive contradictory signals. Audit your CDN response headers, CMS output, and any middleware that injects Link headers — then align them to the same canonical URL.
Redirect chain longer than expected
A redirect chain is created when each redirect target is itself another redirect. Common causes: HTTP to HTTPS redirect, then www to non-www redirect, then a trailing-slash redirect — three hops before the final page. Collapse these into a single 301 that goes directly to the final canonical URL.
Hreflang tags present but x-default missing or incorrect return-links
A well-formed hreflang set requires every URL in the set to include a reciprocal hreflang pointing back to all other URLs in the set. Missing return-links or a missing x-default tag will cause search engines to partially or fully ignore the hreflang cluster. Fix the hreflang set to include all variants and an x-default.
Score is low but the page appears in search results
The CodeAva Indexability Score is a heuristic snapshot, not a Google score. Google may have indexed the page during a previous crawl when signals were cleaner, or it may be overriding a canonical hint based on its own signals. A low score means the current state of the live page has one or more signals that risk future deindexing or reduced visibility — it does not confirm the page was never indexed.