Do I need a sitemap if my site is a Single Page Application?

Yes, and for SPAs it matters more than average. Googlebot can render JavaScript, but the rendering queue introduces delays — it is not instantaneous. A well-structured XML sitemap gives crawlers a direct list of URLs to discover without depending on JavaScript execution or link-following. For SPAs with dynamic routing or client-side navigation, a programmatically generated sitemap is one of the most reliable ways to ensure all important URLs are discovered and queued for crawling.

Can I submit multiple sitemaps to Google and Bing?

Yes. You can submit multiple sitemaps or a single sitemap index file in both Google Search Console and Bing Webmaster Tools. A sitemap index is the preferred approach for sites with many URLs — it references child sitemaps segmented by content type or section. Both Google and Bing support the sitemap index format. You can also reference sitemap files via the Sitemap directive in your robots.txt, which any compliant crawler will follow without requiring manual submission.

What is the difference between a sitemap and a sitemap index?

A sitemap is an XML file listing individual URLs for a site, with optional metadata like lastmod and priority. A sitemap index is an XML file that lists multiple child sitemap files — used when a single sitemap file would exceed the 50,000 URL or 50 MB limit, or when you want to segment content types for operational clarity. Search engines process sitemap index files the same way they process individual sitemaps, just with an extra level of indirection.

Should every URL in a sitemap return a 200 status code?

Yes. Every URL listed in a sitemap should return an HTTP 200 response, be canonically self-referencing, be crawlable by search engines, and not carry a noindex directive. URLs returning 3xx redirects should be updated to their final destination. URLs returning 4xx or 5xx should be removed. Including broken or redirected URLs wastes crawl budget and reduces the signal quality of the sitemap over time.

Should noindex pages appear in a sitemap?

No. A noindex page in a sitemap sends contradictory signals: the sitemap says 'please index this,' while the noindex directive says 'do not index this.' Search engines will generally respect the noindex, but the conflicting signal can confuse crawl prioritization and waste budget. Only include URLs in your sitemap that you actively want discovered and indexed.

How should I handle lastmod for dynamically generated pages?

Use the actual last-modified timestamp of the underlying data, not the current deployment date. For a blog post, that is the post's last-edited date. For a product page, it is the last time the product data changed. For programmatically generated landing pages, use the most recent relevant entity update. Avoid updating lastmod on every deploy unless content actually changed — doing so trains crawlers to distrust your timestamps, reducing the effectiveness of freshness signals.

Can I include hreflang in an XML sitemap?

Yes. The sitemap-based hreflang approach uses xhtml:link elements inside each URL entry to declare alternate language and region versions. Each alternate page must list all other alternates, including itself. This approach is operationally cleaner than managing hreflang in every HTML head tag — especially for large international catalogs with many language-region combinations. Both Google and Bing support sitemap-based hreflang.

Do both Google and Bing use XML sitemaps?

Yes. Both Googlebot and Bingbot support the XML Sitemap Protocol and process sitemap index files. Both can be fed sitemaps via their respective webmaster tools (Google Search Console and Bing Webmaster Tools) and via the Sitemap directive in robots.txt. Bing also pays attention to lastmod freshness signals and sitemap quality when prioritizing crawl. Maintaining a clean, accurate sitemap benefits discovery across all major crawlers, not just Google.

XML Sitemap Best Practices for Modern, Dynamic Websites

Websites are no longer static directories of HTML files. Modern production sites are CMS-backed, API-driven, edge-rendered, or composed from headless layers — with URLs that appear and disappear based on database state, user content, inventory, and localization rules. A sitemap that does not reflect this reality is worse than useless: it wastes crawl budget on stale URLs and fails to surface new ones.

Most sitemap problems are not caused by ignorance of the standard. They are caused by a mismatch between how sitemaps are generated and how the underlying site actually works. A sitemap generated at deployment time on a site that publishes ten articles a day will be out of date within hours. A sitemap that includes every URL the database can produce — including draft pages, noindex pages, and redirected legacy URLs — actively misleads crawlers about what is worth indexing.

TL;DR — the essential rules

A modern XML sitemap should be dynamically generated, reflecting the current state of production content, not a stale export.
It should contain only canonical, indexable, 200-status URLs — no redirects, broken pages, noindex pages, or non-canonical alternates.
Sites with many URLs should use sitemap index files, segmenting by content type for crawl clarity and operational maintainability.
Stale or junk-filled sitemaps reduce crawler trust, waste crawl budget, and create indexation drift for both Google and Bing.

If you have not yet audited your crawl configuration, common robots.txt mistakes are often the companion problem to sitemap issues — both affect discovery and both are easy to get wrong silently. The CodeAva Sitemap Checker validates your sitemap structure and surfaces per-URL issues, and a full Website Audit covers the broader technical SEO picture.

Static vs dynamic sitemaps: the architectural shift

A static sitemap.xml file — manually created or exported once from a tool — made sense when websites were relatively stable and small. It still makes sense for genuinely static sites: a personal portfolio, a documentation site built from a fixed set of Markdown files, a landing page that rarely changes. For those sites, a committed XML file is perfectly adequate.

For most modern sites, static sitemaps have a structural problem: they drift out of sync with reality. Here is why:

CMS-driven sites publish, unpublish, and revise content continuously. A sitemap exported at deployment time will miss anything published after that point and may still list pages that have since been deleted or redirected.
Headless storefronts pull product data from a commerce platform at render time. The live product catalog is not fully known until runtime. A static sitemap generated from a one-time inventory dump is immediately unreliable.
SSR and edge-rendered applications often generate pages dynamically from query parameters, database records, or API responses. The full URL space may not be enumerable without querying the data layer.
Deployment pipelines are a hidden risk. Even a dynamically generated sitemap can become static if the generation step runs once at build time and outputs a committed file that is never refreshed between deploys.

Modern implementation patterns

The right architecture depends on your stack, but the goal is always the same: sitemap generation should be a near-real-time function of your actual content inventory, not an afterthought.

Next.js: the app/sitemap.ts convention exports a function that runs at request time (or at build time for static export), querying your data layer to produce accurate URL lists. This is the pattern used on this site.
Nuxt / Astro: both support programmatic sitemap generation through first-party or ecosystem modules that hook into the content collection or routing layer.
WordPress: plugins like Yoast SEO or Rank Math generate sitemaps dynamically from the posts database. For custom post types or complex sites, validate that all URL types you want indexed are actually included — plugin defaults often exclude custom types.
Shopify: provides a managed sitemap at /sitemap.xmlthat reflects live products, collections, and pages. For custom storefronts built on Shopify's APIs (Hydrogen, custom headless), sitemap generation is your responsibility and must be tied to the Storefront API.

The architectural principle

Your sitemap should be a real-time or near-real-time reflection of your actual production URLs. If it is generated once and committed to a repository, treat it like any other configuration artifact that can go stale — and build a refresh mechanism into your workflow.

The golden rules of modern sitemaps

Rule 1: Only include canonical, indexable, 200-status URLs

Why it matters. A sitemap is a signal of intent: you are telling crawlers which URLs are worth their time. Every low-quality, redirected, broken, or non-canonical URL you include dilutes that signal. Over time, a sitemap full of junk trains crawlers to trust it less — and may cause them to deprioritize even the good URLs listed alongside the bad ones.

Implementation. Before any URL enters your sitemap generation logic, apply these four filters:

Status: include only URLs that return HTTP 200. Remove anything returning 3xx, 4xx, or 5xx.
Canonical: include only the canonical version of each URL. If /product/blue-widget is canonicalized to /product/widget?color=blue, include only the canonical. If a paginated page points to a root canonical, exclude the paginated variant.
Indexability: exclude any URL carrying a noindex meta tag or X-Robots-Tag: noindex response header.
Crawlability: exclude any URL blocked by your robots.txt. A URL that cannot be crawled cannot be indexed, so including it in the sitemap only creates a misleading signal.

Never include these in a sitemap

Redirected URLs (3xx), broken URLs (4xx / 5xx), noindex pages, non-canonical alternate URLs, and robots.txt-blocked paths. Each one wastes a crawl budget slot and reduces sitemap quality over time.

Common mistake. Including all URLs the CMS knows about — including draft states, archived posts, login-required pages, and faceted navigation variants — because the data model makes it easy to query everything. Filter intentionally; do not export exhaustively.

Rule 2: Respect sitemap limits and use sitemap indexes properly

Why it matters. The XML Sitemap Protocol defines hard limits: a single sitemap file may contain a maximum of 50,000 URLs and must be no larger than 50 MB uncompressed. Exceeding these limits means the file will be partially or fully ignored by crawlers. Beyond the hard limit, even a valid 40,000-URL monolithic sitemap is harder to maintain and debug than a well-partitioned sitemap index.

Implementation. Use a sitemap index file at /sitemap.xml that references child sitemaps segmented by content type:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://yourdomain.com/sitemap-pages.xml</loc>
    <lastmod>2026-03-20</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://yourdomain.com/sitemap-blog.xml</loc>
    <lastmod>2026-03-23</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://yourdomain.com/sitemap-products.xml</loc>
    <lastmod>2026-03-22</lastmod>
  </sitemap>
</sitemapindex>

Segmenting by content type has operational advantages: you can regenerate only the affected child sitemap when that content type changes, you can monitor coverage per type in Search Console and Bing Webmaster Tools separately, and errors are easier to isolate.

Common mistake.Generating one enormous flat sitemap file for every URL the site has ever had, never pruning it, and wondering why Search Console reports large numbers of "Discovered — currently not indexed" URLs alongside "Submitted and indexed" ones.

Rule 3: Treat lastmod as a trust signal, not a timestamp

Why it matters. Both Google and Bing use <lastmod>as a hint for crawl freshness prioritization. A URL with a recent, accurate lastmod is more likely to be recrawled promptly when it changes. A sitemap where every URL has today's date — regardless of whether anything actually changed — quickly loses credibility. Crawlers learn from the accuracy of your timestamps; inaccurate data trains them to discount the field entirely.

Implementation. Populate <lastmod> from the actual last-modified date of the underlying content:

Blog posts: use the post's last-edited or last-published date
Product pages: use the last time the product record was meaningfully updated (price, description, availability)
Programmatic landing pages: use the most recent update to the source data that drives the page
Static pages: use the actual date they were last meaningfully changed — not the deployment date

Format values in W3C Date format: YYYY-MM-DD or full ISO 8601 with time and timezone (2026-03-23T09:00:00Z).

Common mistake. Setting <lastmod> to the current timestamp on every deploy because it is easy to automate. This is one of the most common sitemap quality problems and one of the easiest to introduce accidentally in a CI/CD pipeline.

Rule 4: Omit priority and changefreq

Why it matters. The <priority> and <changefreq> fields are part of the XML Sitemap specification, but Google ignores both. They do not influence crawl scheduling, indexing priority, or ranking. Setting <priority>1.0</priority>on every URL — the most common pattern — is the equivalent of writing "URGENT" on every email in your outbox.

Implementation. Simply omit them. A clean, lean sitemap with accurate <loc> and <lastmod> values is more useful than a verbose one padded with fields that are ignored:

<!-- Lean — recommended -->
<url>
  <loc>https://yourdomain.com/blog/post-title</loc>
  <lastmod>2026-03-15</lastmod>
</url>

<!-- Bloated — unnecessary fields -->
<url>
  <loc>https://yourdomain.com/blog/post-title</loc>
  <lastmod>2026-03-15</lastmod>
  <changefreq>weekly</changefreq>
  <priority>0.8</priority>
</url>

Common mistake. Mass-assigning priority tiers — 1.0 for the homepage, 0.8 for category pages, 0.5 for posts — because it looks systematic and professional. It adds file size and maintenance overhead with no effect on how major search engines process the sitemap.

Exception: downstream consumers

If you use a sitemap in a non-search-engine context — for example, feeding a custom crawler, a content migration tool, or a monitoring system that reads priority values — keeping these fields may make sense for that specific pipeline. For general SEO purposes, omit them.

Rule 5: Use image and video extensions where they genuinely help discovery

Why it matters. Standard sitemaps list page URLs. Image and video sitemap extensions allow you to annotate those URLs with references to media that may not be easily discoverable from rendered HTML alone — for example, images loaded via JavaScript, video embeds, or media in single-page application views that require rendering to expose.

Implementation. Add image or video extensions only for content where discovery is genuinely at risk:

Image-heavy catalogs, portfolios, or galleries where visual search discovery matters
Video content libraries where the video URL and metadata are worth surfacing for rich results
JavaScript-rendered media that Googlebot may not reliably extract during rendering

Common mistake.Adding image extensions to every page URL in the sitemap to "improve image SEO" without checking whether those images are actually at risk of being undiscovered. For standard server-rendered HTML with inline <img> tags, Googlebot will find images during normal crawling without sitemap extensions.

Rule 6: Tie sitemap generation to your real publishing workflow

Why it matters. A sitemap is only as accurate as the process that generates it. If new content is published but the sitemap is only regenerated on the next deploy, there is a discovery gap. If deleted content is removed from the site but the sitemap is not updated, crawlers will repeatedly attempt to fetch 404 URLs — wasting budget and eroding trust.

Implementation. Build sitemap generation into the events that change your URL inventory:

CMS publish and unpublish hooks: trigger a sitemap regeneration (or incremental update) when content is published, revised, or removed
Product inventory changes: regenerate product sitemaps when SKUs are added, deactivated, or redirected
Programmatic landing page changes: sync sitemap generation with the data layer that drives those pages
Deployment validation: include a post-deploy check that confirms the live sitemap is well-formed, references the correct URLs, and is linked from robots.txt

Common mistake. Generating the sitemap once during the initial build and treating it as a static artifact. This works for genuinely static sites. For dynamic sites, it creates invisible indexation drift that compounds over time.

GEO and localization: hreflang in XML sitemaps

For sites serving multiple languages or regions, declaring hreflang alternate relationships helps search engines serve the right language version to the right audience. The two supported implementation methods are HTML <link rel="alternate"> tags in the page head, and hreflang entries in the XML sitemap. For large international sites, the sitemap approach is often operationally cleaner.

The sitemap approach uses xhtml:link elements inside each URL entry. Every URL entry must declare all of its alternates — including itself:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xhtml="http://www.w3.org/1999/xhtml">

  <url>
    <loc>https://yourdomain.com/en/about</loc>
    <xhtml:link rel="alternate" hreflang="en"
      href="https://yourdomain.com/en/about"/>
    <xhtml:link rel="alternate" hreflang="fr"
      href="https://yourdomain.com/fr/about"/>
    <xhtml:link rel="alternate" hreflang="de"
      href="https://yourdomain.com/de/about"/>
    <xhtml:link rel="alternate" hreflang="x-default"
      href="https://yourdomain.com/en/about"/>
  </url>

  <url>
    <loc>https://yourdomain.com/fr/about</loc>
    <xhtml:link rel="alternate" hreflang="en"
      href="https://yourdomain.com/en/about"/>
    <xhtml:link rel="alternate" hreflang="fr"
      href="https://yourdomain.com/fr/about"/>
    <xhtml:link rel="alternate" hreflang="de"
      href="https://yourdomain.com/de/about"/>
    <xhtml:link rel="alternate" hreflang="x-default"
      href="https://yourdomain.com/en/about"/>
  </url>

</urlset>

When sitemap-based hreflang is the right choice

Large international catalogs: adding dozens of <link>alternates to each page's HTML head creates significant markup overhead. The sitemap approach keeps the page HTML clean.
Many language-region combinations: for sites serving 20+ locale combinations, the sitemap approach is far easier to maintain than per-page head tags.
Headless or API-driven architectures: where the HTML rendering layer is decoupled from content management, sitemap generation is often easier to control than injecting per-page head elements for every locale.

hreflang must be bidirectionally consistent

Every alternate URL listed in a hreflang group must also reference all other URLs in that group. If the English page lists the French alternate but the French page does not list the English alternate, the relationship is invalid. Incomplete hreflang is treated as if the declarations were not present. Always generate these from the same data model so consistency is enforced programmatically, not manually.

How to validate your sitemap pipeline

Sitemap errors are not loud. There are no 500 responses, no broken build badges, and no user-facing symptoms until you notice pages dropping out of the index. The only reliable way to catch problems is to validate proactively — before issues reach Search Console.

A practical validation workflow:

Review the live sitemap. Fetch https://yourdomain.com/sitemap.xml directly in a browser. Confirm the root file exists and is well-formed XML. If using a sitemap index, check that all child sitemap URLs resolve correctly.
Validate structure and URL quality. Use the CodeAva Sitemap Checker to parse the XML, identify structural errors, detect duplicate entries, validate lastmod formats, spot relative URLs, and flag entries that fall outside the protocol.
Verify URL responses. A subset check of listed URLs should confirm they return HTTP 200, serve the correct content type, and are not carrying noindex headers. Use the HTTP Headers Checker to inspect individual URLs, or include automated URL-response checks in your post-deploy smoke test suite.
Check robots.txt consistency. Confirm that your sitemap URL is referenced in robots.txt via a Sitemap: directive, and that no listed sitemap URLs are blocked by Disallow rules. An automated robots.txt check catches this in seconds.
Monitor Search Console and Bing Webmaster Tools. Submit your sitemap index to both. Review coverage reports for "Discovered — currently not indexed," "Crawled — currently not indexed," and "Excluded" categories. These reports surface the crawl quality of your listed URLs over time.
Run a full technical health check. The CodeAva Website Auditchecks canonical tags, meta robots, response codes, Open Graph, security headers, and crawlability in one pass — confirming that your sitemap's URL set is backed by pages that are technically healthy.

Don't wait for Search Console to tell you

Google Search Console and Bing Webmaster Tools report sitemap errors after the fact — sometimes days after a broken deploy. Validate your sitemap and key URLs before and after every deployment, not reactively after rankings have already been affected.

Pro tip: audit-first sitemap operations

The teams with the fewest sitemap problems share a consistent habit: they treat sitemap quality as an operational metric, not a one-time setup task.

Generate dynamically. Never commit a static sitemap for a dynamic site. Tie generation to your data layer and content events.
Validate on every deploy. Include a sitemap check in your post-deploy smoke tests. Assert that the root URL resolves, the XML is valid, and no full-site crawl blocks are present in robots.txt.
Keep sitemap quality in your SEO KPIs. Track the ratio of submitted-to-indexed URLs in Search Console. A declining index rate on submitted URLs is an early signal of sitemap quality degradation.
Treat sitemap and robots.txt together. robots.txt controls crawl access and sitemaps communicate content intent. They should be reviewed together as a coherent system, not managed independently.

Conclusion

A sitemap should be a trustworthy map of live, indexable content — not a dump of every URL your stack can produce. The difference between those two things is the difference between a sitemap that helps crawlers work efficiently and one that quietly wastes their time.

Modern sites need modern sitemap practices: dynamic generation, clean URL filtering, sitemap indexes at scale, accurate timestamps, and a validation workflow that catches problems before they show up in coverage reports. None of these are difficult to implement — but they do require treating the sitemap as a first-class engineering artifact rather than an SEO checkbox.

Ready to find out where yours stands? Run a Website Audit for a full technical health check, validate your XML sitemap for structural and URL quality issues, or use the HTTP Headers Checker to confirm clean 200 responses on the URLs that matter most.

XML Sitemap Best Practices for Modern, Dynamic Websites

Static vs dynamic sitemaps: the architectural shift

Modern implementation patterns

The golden rules of modern sitemaps

Rule 1: Only include canonical, indexable, 200-status URLs

Rule 2: Respect sitemap limits and use sitemap indexes properly

Rule 3: Treat lastmod as a trust signal, not a timestamp

Rule 4: Omit priority and changefreq

Rule 5: Use image and video extensions where they genuinely help discovery

Rule 6: Tie sitemap generation to your real publishing workflow

GEO and localization: hreflang in XML sitemaps

When sitemap-based hreflang is the right choice

How to validate your sitemap pipeline

Pro tip: audit-first sitemap operations

Conclusion

Frequently asked questions

More from Sophia DuToit

Static vs dynamic sitemaps: the architectural shift

Modern implementation patterns

The golden rules of modern sitemaps

Rule 1: Only include canonical, indexable, 200-status URLs

Rule 2: Respect sitemap limits and use sitemap indexes properly

Rule 3: Treat lastmod as a trust signal, not a timestamp

Rule 4: Omit priority and changefreq

Rule 5: Use image and video extensions where they genuinely help discovery

Rule 6: Tie sitemap generation to your real publishing workflow

GEO and localization: hreflang in XML sitemaps

When sitemap-based hreflang is the right choice

How to validate your sitemap pipeline

Pro tip: audit-first sitemap operations

Conclusion

Frequently asked questions

Do I need a sitemap if my site is a Single Page Application?

Can I submit multiple sitemaps to Google and Bing?

What is the difference between a sitemap and a sitemap index?

Should every URL in a sitemap return a 200 status code?

Should noindex pages appear in a sitemap?

How should I handle lastmod for dynamically generated pages?

Can I include hreflang in an XML sitemap?

Do both Google and Bing use XML sitemaps?

More from Sophia DuToit

Related articles