How do you audit crawlability after a site migration?

Treat it as four pillars plus monitoring. Verify that every old URL returns a permanent server-side redirect to its best new equivalent, confirm the new XML sitemap contains only canonical 200-status URLs, check that robots.txt actually allows the new structure to be crawled, and confirm that canonicals, internal links, sitemaps, and redirects all point to the same preferred URLs. Then watch Search Console for not-found spikes, soft 404s, duplicate-without-user-selected-canonical entries, and sitemap submission issues in the weeks after launch.

Should I use 301 or 302 redirects after a migration?

Use 301 (or 308) for permanent moves. A 302 or 307 tells search engines the move is temporary and can delay consolidation of ranking signals onto the new URL. If the old URL will never come back, the redirect must be permanent. The only legitimate reasons to use a temporary redirect during a migration are short-term rollback plans or A/B tests — neither of which describes the steady-state post-launch configuration.

Why are homepage redirects a bad migration strategy?

Redirecting large numbers of old URLs to the homepage looks convenient but is one of the most damaging post-migration patterns. The homepage is almost never a meaningful replacement for a product page, article, or category. Google can treat irrelevant redirects like soft 404s, which weakens consolidation of ranking signals and slows recovery. If a retired URL has no meaningful replacement, a real 404 or 410 is usually a cleaner answer than a misleading redirect.

What should be in the sitemap after a migration?

Only canonical, indexable URLs that return a 200 status. Remove anything that redirects, 404s, is blocked by robots.txt, has noindex set, or is a duplicate of another canonical URL. The new sitemap should be a clean list of the destinations you want search engines to index. A dirty sitemap full of redirected or non-canonical URLs slows discovery and adds noise to Search Console coverage reports.

Can robots.txt break crawlability after launch?

Yes, and it is one of the most common post-migration incidents. Teams ship with staging rules still active (Disallow: /), add overly broad Disallow patterns for new faceted paths that also match important templates, forget to update the Sitemap directive, or move the file and break the path. Always validate the live production robots.txt against the intended crawl policy immediately after launch and whenever infrastructure changes.

Why are old-domain canonicals dangerous after a migration?

If the new URLs canonicalize to the old domain or old URL format, you are telling search engines that the old URLs are the preferred ones. That conflicts with the redirects from old to new and creates mixed signals that delay consolidation. On the new site, canonicals should usually self-reference the preferred new URL, reinforcing the same destination declared by redirects, internal links, and the sitemap.

What is a soft 404 in a migration context?

A soft 404 is a URL that returns a 200 (or redirects to a 200) but does not meaningfully satisfy the request — typically an old, removed product redirected to the homepage or a generic category. Google classifies these as effectively not-found because the response is not a real replacement for the original intent. Soft 404s are a common side effect of lazy 'redirect everything to the homepage' migration plans.

What should I monitor in Search Console after a migration?

Verify both old and new properties where relevant. Watch the Page Indexing report for spikes in not-found (404), soft 404, server errors, redirect errors, and 'duplicate without user-selected canonical'. Check sitemap submission status and the number of submitted vs indexed URLs. Use URL Inspection to live-test representative templates. Watch the Crawl Stats report for unusual response-time patterns during the transition. Keep this monitoring active for several weeks — consolidation is not instant.

How to Audit Crawlability After a Site Migration

The new site is live. The design system looks great. Launch day was uneventful. Two weeks later, organic traffic is down 30 percent, Search Console is flagging redirect errors, and a third of the old URL inventory has been classified as soft 404. The team thought the migration was done. Search engines disagree.

A migration is not finished when the new site is live. It is finished when crawlers can reliably discover, fetch, and consolidate the new URLs — and when the signals the new site sends are consistent with the signals the old site sent. Crawlability after a migration is an engineering problem with a short list of failure modes: redirect integrity, discovery mapping, robots access, canonical alignment, and monitoring.

This guide is a forensic post-migration audit. The order matters: broken redirects and blocked crawl paths make every other diagnosis unreliable, so they come first.

TL;DR

A post-migration crawlability audit focuses on four pillars: redirect integrity, discovery (sitemaps and internal links), crawl access (robots.txt), and canonical alignment.
Old URLs should return permanent server-side redirects directly to the best new equivalent, not a chain and not the homepage.
The new XML sitemap must contain only canonical, indexable, 200-status URLs.
Robots.txt must notaccidentally block the new structure — especially staging rules or overly broad disallow patterns.
Canonicals, internal links, redirects, and sitemaps must all agree on the same preferred URLs.
Monitor Search Console and crawl behavior for weeks after launch; consolidation is not instant.

Audit the live site the way a crawler sees it, not the way the design team sees it. The CodeAva URL Indexability Inspector follows the full redirect chain, reads the final response headers and HTML, extracts meta robots, X-Robots-Tag, rel=canonical, and hreflang, and checks robots.txt — which covers most of the per-URL signals this audit depends on. Pair it with the broader CodeAva Website Audit for a quick technical-hygiene snapshot of the homepage and key templates.

Why post-migration crawlability fails

A site move introduces uncertainty for crawlers. URLs change, structures change, sometimes domains change. Search engines need strong, consistent signals to map old URLs to new destinations and to reassign the authority those old URLs had accumulated.

Most post-migration crawl problems are not “the site is uncrawlable everywhere”. They are specific, recurring failures that cluster around a handful of templates or URL patterns:

A redirect rule that forgot one URL pattern, so 8 percent of old inventory returns 404.
A canonical template that still hardcodes the old domain on some pages.
A staging-mode robots.txt that shipped to production on launch day.
A sitemap generator that still lists the pre-migration URL set.
A redirect chain that hops through two intermediate URLs before reaching the new destination.

The fix is not to guess. It is to audit each pillar in order, one template and one URL sample at a time, and remove the mixed signals.

Pillar 1: redirect integrity

Permanent moves need permanent server-side redirects. That means 301 (or 308, which preserves the HTTP method) — not 302, not 307, not client-side JavaScript redirects. Old URLs should resolve to their final new destination in a single hop whenever the architecture allows.

Pass vs. fail patterns

A clean migration redirect map looks like this:

# Pass: direct 1:1 map to the best new equivalent
GET /old/product/blue-widget-v2  →  301  →  /products/blue-widget

# Pass: consolidated move when the old page is genuinely merged
GET /old/category/blue-widgets   →  301  →  /category/widgets?color=blue

# Fail: redirect chain adds latency and auditing noise
GET /old/product/blue-widget-v2  →  301  →  /products/blue-widget-v2
                                    →  301  →  /products/blue-widget

# Fail: irrelevant destination creates soft-404 risk
GET /old/product/discontinued    →  301  →  /          (homepage)
GET /old/product/discontinued    →  301  →  /products  (unrelated)

# Fail: temporary redirect for a permanent move
GET /old/product/blue-widget-v2  →  302  →  /products/blue-widget

Google can follow redirect chains up to a reasonable limit and will usually consolidate signals eventually, but direct redirects resolve faster, load fewer bytes, and are much easier to audit. Every unnecessary hop is a place something can go wrong — a regex mistake, a canonicalisation mismatch, a missed query string.

Validate redirects at the HTTP layer, not in a browser. Browsers hide the chain. A tool that follows and prints the hop list — such as the URL Indexability Inspector or a curl -ILon each representative old URL — gives you the truth about what search engines see.

Pillar 2: discovery mapping with sitemaps and internal links

Once redirects are clean, the next question is: can crawlers find the new URLs efficiently? Discovery happens through two main channels — the XML sitemap and the site’s own internal link graph.

Sitemap hygiene

The new sitemap must contain only URLs that meet all of the following:

Return HTTP 200 directly (no redirects, no errors).
Are allowed by robots.txt.
Are not noindex via meta robots or X-Robots-Tag.
Self-reference via canonical, or are the canonical target of another URL.
Use absolute URLs, not relative paths.

Anything that fails those checks pollutes the sitemap and adds noise to Search Console’s coverage reports. A common post-migration mistake is carrying over the old sitemap generator verbatim, so the file still contains URLs that now redirect or 404.

For deeper sitemap guidance, see XML Sitemap Best Practices. For a quick live validation of the production file, the CodeAva Sitemap Checker parses the XML, flags duplicates, relative URLs, and malformed entries, and shows a per-URL list you can scan for post-migration noise.

Internal links

Internal links are the most-trusted discovery signal a site controls directly. After a migration, every internal link that still points to an old URL forces crawlers through an extra redirect hop and sends a weak signal about which URL is preferred.

Audit internal links for:

Navigation menus (header, footer, mega menus)
Breadcrumbs and related-content rails
Pagination and facet links
In-body links inside CMS content
Hero CTAs and marketing module links
Structured data references (item URLs, breadcrumb lists)

Update the templates and the CMS content. Template links update instantly. Hardcoded in-body links in thousands of legacy articles usually do not, and they are often the last holdouts in a migration cleanup.

Transitional tactics

During a migration, some teams temporarily keep the old URL inventory accessible in monitoring tools or a separate file to validate that every old URL has a working redirect. That is a legitimate operational tactic, not a universal requirement. The production sitemap shipped at the new URL should stay clean and canonical-first. Do not confuse internal migration monitoring with public discovery signals.

Pillar 3: robots.txt and crawl access

The single highest-impact post-launch incident is a production robots.txt that blocks the new site. It happens more often than teams admit: staging rules ship with the deploy, a new CMS inserts its default disallow list, an overly broad pattern matches more than intended, or the file is served from the wrong path after an infrastructure change.

Post-migration, verify at minimum:

https://new-domain.com/robots.txt returns 200 with text/plain content-type.
The homepage is allowed (Allow: / or the absence of a blocking rule).
Each critical template is allowed: product pages, category pages, article pages, landing pages, pagination.
New path patterns introduced by the migration (for example, /shop/, /blog/, localized folders) are not accidentally matched by a generic disallow.
The file contains the correct Sitemap: directive pointing at the new sitemap URL.
Staging-specific rules such as Disallow: / or Disallow: /admin are not leaking into production.

Robots.txt is not the only crawl-control layer. Meta robots, X-Robots-Tag HTTP headers, server authentication, and CDN rules can all block crawlers too. The Robots.txt vs. Meta Robots vs. X-Robots-Tag comparison explains which signal to use where. For the most common robots.txt mistakes that ship in migrations, see Robots.txt Mistakes Blocking SEO. To validate the live file, run it through the CodeAva Robots.txt Checker, which parses the rules, tests URL access per user-agent, and extracts the declared sitemap.

Pillar 4: canonical alignment

Canonical tags exist to resolve duplicate-URL ambiguity. On the new site, they must point to the new preferred URLs — not to the old domain, not to the old URL format, not to a stale template default from the previous stack.

The migration-critical checks:

Self-referencing canonicals on the new preferred URLs by default.
No old-domain canonicals. After a domain move, canonicals pointing to the old host directly contradict the redirects.
Absolute URLs in canonical tags, with the correct protocol and host.
Consistency across the stack. The canonical, the internal link, the sitemap entry, and the redirect destination must all agree on the same URL.
No canonical to a redirected URL. A canonical pointing at a URL that 301-redirects creates a loop of conflicting signals.

Server-rendered canonicals are easier to audit than client-injected ones. If your stack renders canonical tags after hydration, verify that the value delivered in the initial HTML matches the final DOM. For JavaScript and headless stacks specifically, see Canonical Tags for JavaScript & Headless Websites.

Soft 404 clusters: the migration mistake that looks “safe”

The most damaging migration shortcut is catching every orphan redirect with a fallback to the homepage or a generic category page. Operationally it looks safe: no URL ever returns a 404, the redirect map is trivially complete, and the rollback story is easy.

Technically, it is not safe. If the redirect destination is not a meaningful replacement for the original URL, Google will classify it as a soft 404. The old URL does not consolidate onto the new destination, the new destination picks up noisy signals, and the migration recovers more slowly.

A practical framework:

Old product → equivalent new product: good. Direct 1:1 replacement of intent.
Old product → parent category with that product consolidated into it: acceptable, provided the category really is the intended replacement (same brand, family, intent).
Old product → unrelated category: poor. The destination does not satisfy the original intent.
Old product → homepage: almost always soft-404 territory. Use only when the URL had essentially no useful content or traffic to preserve.
Old URL with no replacement: return a real 404 or 410. An honest “this is gone” is better for crawl efficiency and signal clarity than a fake redirect.

The most dangerous migration shortcut

The most dangerous post-migration shortcut is “redirect everything to the homepage and fix it later.” It may look operationally safe, but it produces weak relevance signals, soft-404 clusters at scale, and a much harder recovery path. Map old URLs to genuinely equivalent destinations, or return a real 404/410 — never pretend the homepage is a replacement for something it is not.

Response quality and crawl efficiency

Correctness is the first gate; response quality is the second. A redirect that works but takes two seconds to resolve is still a problem during a migration. Crawlers budget time per host, and site moves often increase crawl demand while the new architecture is being mapped — so poor response behavior becomes more visible, not less, during transition periods.

Check for:

Redirect latency. 301s should be near-instant at the edge. Slow redirects usually mean the app is booting or a database lookup is happening inside the redirect path.
Redirect chains. Three hops take three network round trips. Collapse them where possible.
App-boot delays. Cold serverless functions, slow CMS rendering, or full-page server components with uncached dependencies can make the final 200 response slow even when the redirect chain is clean.
Unstable infrastructure. 5xx spikes during the migration window are far more visible to crawlers because they are visiting more URLs more often.

For small sites, “crawl budget” is rarely the limiting factor — most pages get crawled often enough. For large sites or large migrations, crawl efficiency is a first-class concern. Tight, fast, predictable responses shorten the consolidation window regardless of site size.

Search Console checks after migration

Verify every relevant property. If you changed domain or protocol, you likely need a new Search Console property for the new host, and you should keep the old property verified for at least the monitoring window. If a full domain move applies and the Change of Address workflow is still relevant in your configuration, use it.

What to inspect in the weeks after launch:

Page Indexing reporton the new property. Watch for spikes in “Not found (404)”, “Soft 404”, “Redirect error”, “Server error (5xx)”, and “Duplicate without user-selected canonical”.
Sitemap submission status. Submit the new sitemap. Confirm it is read successfully. Track submitted vs indexed counts over time.
URL Inspection on representative templates: homepage, a product, a category, an article, a paginated page. Confirm Google sees the correct canonical, meta robots, and rendered content.
Crawl Stats(Settings → Crawl Stats). Watch response-time and response-code distributions. Sudden shifts toward slow or error responses indicate an infrastructure regression.
Performance report. Track impressions and clicks per URL group. A well-executed migration will show the old URLs trailing off and the new URLs picking up equivalent impressions within weeks.

For the “discovered but not indexed” and “crawled but not indexed” statuses that often spike during migrations, see the diagnosis workflows in Discovered – Currently Not Indexed fixes and Crawled – Currently Not Indexed fixes. Both are common post-migration signals, and their causes overlap heavily with the pillars in this guide.

The post-migration crawlability checklist

Work through these in order. Each step assumes the previous one is clean; fixing canonicals before fixing redirects is almost always wasted effort.

Confirm every old URL redirects permanently to the best new equivalent. Sample at least the top 1–5 percent of old URLs by traffic, and at least one URL from every template. Use a tool that follows the full HTTP chain.
Remove redirect chains wherever possible. Rewrite rules so old URLs land on the final destination in a single hop.
Spot-check old high-value URLs manually to confirm the new destination is a meaningful replacement, not a soft-404-prone fallback.
Confirm the new sitemap contains only canonical 200 URLs, served with absolute URLs and a valid lastmod.
Confirm robots.txt allows crawling of the new structure, with no leftover staging rules and a correct Sitemap: directive.
Confirm canonicals self-reference the preferred new URLs and never point to the old domain or a redirected URL.
Confirm internal links— navigation, breadcrumbs, pagination, in-body content — point directly to the new preferred URLs.
Replace homepage redirects with real mappings or real 404/410 responses wherever the current destination is not a meaningful replacement.
Inspect Search Console indexing, sitemap, URL Inspection, and Crawl Stats reports. Investigate every spike.
Re-crawl key templates and legacy URL samples after fixes to confirm the live output matches the intent. Repeat until the sampled URLs are clean.

Validate in live HTTP, not just in staging

Staging environments lie. Edge caches, CDN rules, WAFs, auth middleware, and header rewriters live in production only. A migration audit is only trustworthy when it inspects the public, live production response for each URL — redirect chain, final status, headers, and rendered HTML.

Common migration mistakes teams make

Using 302s for a permanent migration. Signals the move is temporary and delays consolidation.
Keeping redirect chains in placebecause “Google can follow them anyway.” It can, but every hop adds latency and auditing noise.
Redirecting large numbers of old URLs to the homepage. Soft-404 risk at scale; weak consolidation of old ranking signals.
Launching with staging robots.txt rules. The single highest-impact post-launch incident.
Shipping stale canonicals that still point to the old domain or the old URL format.
Submitting dirty sitemaps full of redirected, 404, or non-canonical URLs.
Updating templates but forgetting internal links in CMS content, structured data, and marketing modules.
Ignoring Search Console after launch. Problems that are obvious at week one often take weeks or months to resolve once they compound.
Treating migration as “done” at launch. The migration is only done when new URLs are consolidated and indexed at a level consistent with the old ones.

Post-migration audit cheat sheet

One row per audit area. Use it as a reviewer’s checklist during a launch readiness pass or a post-launch incident triage.

Audit area	What good looks like	Common failure	Impact on crawlability
Redirects	Single-hop permanent server-side 301/308 to the best new equivalent.	302s, JS redirects, long chains, irrelevant destinations.	Slower consolidation of signals; higher soft-404 risk.
Sitemap	Only canonical, indexable 200 URLs. Absolute paths. Valid `lastmod`.	Old URLs, redirected URLs, noindex URLs, relative paths.	Noisy discovery; misleading coverage reports.
Robots.txt	Allows all public templates; declares the new sitemap; no staging rules.	`Disallow: /`, overly broad patterns, missing sitemap directive, wrong file path.	Can block entire sections of the new site from being crawled.
Canonical tags	Self-referencing, absolute, new-domain, consistent with redirects and sitemap.	Old-domain canonicals, canonicals to redirected URLs, client-injected mismatches.	Conflicting signals delay index consolidation.
Internal links	All navigation, breadcrumbs, and in-body links point at the new preferred URLs directly.	Links still hit old URLs and redirect; CMS content not updated.	Extra hops; weaker discovery of the new URLs.
Old URL handling	Direct redirects to meaningful equivalents; honest 404/410 for retired URLs with no replacement.	Blanket homepage redirects; “catch-all” category fallbacks.	Soft-404 clusters; weak signal transfer.
Search Console monitoring	Old and new properties verified; indexing, sitemaps, crawl stats reviewed weekly.	No verification on new host; no one watches indexing anomalies post-launch.	Regressions compound invisibly; recovery takes months instead of weeks.

Automate the forensic checks

A good post-migration audit blends manual review with targeted automated checks on the live production response. The CodeAva tooling most relevant to this workflow:

URL Indexability Inspector — follows the live redirect chain, parses final response headers, extracts meta robots, X-Robots-Tag, canonical, and hreflang, and checks robots.txt crawlability hints for the host. Use it per URL for deep diagnostics on representative old and new URLs.
Robots.txt Checker — validates the live robots.txt, tests URL access per user-agent, and extracts the declared sitemap URL. First thing to run on launch day.
Sitemap Checker — parses the new sitemap XML, flags duplicates, relative URLs, malformed entries, and surfaces fixes before Search Console sees the noise.
Website Audit — quick technical-hygiene snapshot of a URL: HTTP status, metadata, canonical, H1, Open Graph, Twitter Card, security headers, robots.txt and sitemap.xml reachability. Useful for the homepage and a handful of key templates as part of a launch-readiness pass.

These tools do not replace a site-wide crawler or Search Console. They complement them by giving fast, reliable per-URL and per-file diagnostics on the live public response, which is what search engines actually consume.

A migration is done when crawlers agree it is done

Shipping the new site is not the end of the migration; it is the start of the consolidation window. Work is finished only when search engines can reliably discover, fetch, and consolidate the new URLs — and that requires every signal the site sends to agree on the same set of preferred destinations.

The core audit areas do not change: redirect integrity, discovery via sitemaps and internal links, robots-level crawl access, and canonical alignment, with disciplined Search Console monitoring for at least several weeks after launch. Teams that verify the live output quickly and fix inconsistencies before they compound recover in weeks. Teams that do not usually recover in quarters.

When you are ready to audit a specific page, run it through the CodeAva URL Indexability Inspector for a full per-URL signal report, validate the production robots.txt and sitemap, and use the CodeAva Website Audit for a broader hygiene snapshot of the homepage and core templates. For deeper reading on the supporting pillars, see Robots.txt Mistakes Blocking SEO, XML Sitemap Best Practices, and Canonical Tags for JavaScript & Headless Websites.

How to Audit Crawlability After a Site Migration

Why post-migration crawlability fails

Pillar 1: redirect integrity

Pass vs. fail patterns

Pillar 2: discovery mapping with sitemaps and internal links

Sitemap hygiene

Internal links

Transitional tactics

Pillar 3: robots.txt and crawl access

Pillar 4: canonical alignment

Soft 404 clusters: the migration mistake that looks “safe”

Response quality and crawl efficiency

Search Console checks after migration

The post-migration crawlability checklist

Common migration mistakes teams make

Post-migration audit cheat sheet

Automate the forensic checks

A migration is done when crawlers agree it is done

Frequently asked questions

More from Rohit Trivedi

Why post-migration crawlability fails

Pillar 1: redirect integrity

Pass vs. fail patterns

Pillar 2: discovery mapping with sitemaps and internal links

Sitemap hygiene

Internal links

Transitional tactics

Pillar 3: robots.txt and crawl access

Pillar 4: canonical alignment

Soft 404 clusters: the migration mistake that looks “safe”

Response quality and crawl efficiency

Search Console checks after migration

The post-migration crawlability checklist

Common migration mistakes teams make

Post-migration audit cheat sheet

Automate the forensic checks

A migration is done when crawlers agree it is done

Frequently asked questions

How do you audit crawlability after a site migration?

Should I use 301 or 302 redirects after a migration?

Why are homepage redirects a bad migration strategy?

What should be in the sitemap after a migration?

Can robots.txt break crawlability after launch?

Why are old-domain canonicals dangerous after a migration?

What is a soft 404 in a migration context?

What should I monitor in Search Console after a migration?

More from Rohit Trivedi

Related articles