Fact-checked by Grok 2 weeks ago

Sitemaps

A sitemap is a structured , typically in XML , that lists the URLs of a website's pages along with optional metadata such as the last modification date, change frequency, and relative priority to help search engines discover, , and site content more efficiently. The Sitemaps protocol was introduced in 2005 by Google to address challenges in crawling large or dynamically generated websites, and it gained broader adoption in 2006 when Yahoo and Microsoft announced joint support, leading to the establishment of sitemaps.org as the official collaborative resource. Sitemaps conform to a specific XML schema that requires elements like <urlset> and <loc> for each URL (limited to 2,048 characters and from a single host), while optional tags such as <lastmod> (in W3C datetime format), <changefreq> (values like "always," "hourly," "daily," "weekly," "monthly," "yearly," or "never"), and <priority> (a decimal from 0.0 to 1.0, defaulting to 0.5) provide additional guidance for crawlers. Each sitemap file is limited to 50,000 URLs or 50 megabytes (uncompressed), with support for gzip compression, and for larger sites, a separate sitemap index file can reference up to 50,000 individual sitemaps. Website owners submit sitemaps to search engines via tools like , by adding a directive in the site's file, or through HTTP requests, enabling faster discovery of new or updated pages that might lack internal links. Benefits include improved indexing for sites with over 500 pages, those featuring rich media like images or videos, content, or versions in multiple languages, though small, well-linked sites may not require them. Specialized sitemap variants exist for images, videos, and , extending the 's utility beyond basic URL lists. All sitemaps must be encoded and entity-escaped to ensure compatibility with parsers.

Fundamentals

Definition and Purpose

A is a or structured source that lists the URLs of a website's pages, videos, images, and other files to inform search engines about content available for crawling and indexing. This enables webmasters to provide structured information about site organization and relationships between resources, supplementing traditional link-based discovery methods. The XML format serves as the standard under the official Sitemaps , supported by major search engines including , , and . The core purposes of sitemaps are to assist search engines in discovering new or updated content that might otherwise be overlooked, especially on large, dynamic, or poorly linked sites. They achieve this by including such as the last modification date (), expected change frequency ( values like "daily" or "monthly"), and relative priority ( on a 0.0–1.0 scale) for each . This guidance helps optimize crawling efficiency, allowing search engines to prioritize high-value pages and allocate resources more effectively. Key benefits include minimizing crawl budget waste— the limited resources search engines dedicate to site exploration—by directing bots toward important content and away from irrelevant paths. Sitemaps help with discovery of new content, potentially accelerating indexing, though times can vary from hours to weeks depending on factors like site size and crawl budget. They boost overall visibility in search results without dependence on internal hyperlinks alone. In contrast to robots.txt files, which specify access permissions to block or allow crawling of certain directories, sitemaps emphasize content suggestion and metadata to enhance discovery and indexing processes.

History

The concept of sitemaps first emerged in the late as part of early practices aimed at improving user on increasingly complex websites. Publishers and guides, such as the Web Style Guide, recommended including hierarchical site maps—often as simple pages or diagrams—to help visitors understand site structure and locate content efficiently. By the early , with the rapid growth of search engines, these user-focused maps began evolving toward machine-readable formats to assist automated crawling and indexing, addressing inefficiencies in discovering new or updated pages across large sites. A key milestone came in June 2005 when introduced the initial Sitemaps protocol (version 0.84) in XML format, enabling webmasters to submit lists of URLs along with like last modification dates and change frequencies to guide crawlers more effectively. This addressed post-search engine boom challenges, such as incomplete crawling of dynamic or poorly linked content. In November 2006, , Yahoo!, and jointly announced support for the protocol, formalizing it under version 0.9 and establishing sitemaps.org as the central documentation site managed by a working group of representatives from these companies. The protocol saw rapid extensions to support specialized content: a news extension was added in November 2006 to prioritize timely articles with publication timestamps, followed by image extensions in April 2010 for enhanced media discovery, and video extensions in December 2007 to include details like duration and thumbnails. These developments were driven by Google engineers, notably Vanessa Fox, who contributed to launching sitemaps.org and building the associated Webmaster Central tools to facilitate adoption. In recent years, the protocol has remained stable with ongoing maintenance by major search engines, though without significant overhauls. A notable change occurred in June 2023 when deprecated the Sitemap Ping Endpoint—a mechanism for notifying engines of updates—which ceased functioning by December 2023, encouraging reliance on direct sitemap submissions via tools like and accurate lastmod tags for discovery.

Core Formats

XML Sitemap Protocol

The XML Sitemap Protocol defines a standardized XML format for listing URLs to facilitate discovery by crawlers. It specifies a that encapsulates all entries, with each individual URL represented as a child <url> element. The protocol mandates inclusion of the declaration xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" in the <urlset> tag to ensure compatibility and validation. Within each <url> element, the <loc> tag is required and contains the canonical of the page, limited to 2,048 characters. Optional elements include <lastmod>, which records the last modification date in W3C Datetime format (equivalent to ); <changefreq>, indicating update frequency with values such as "always", "hourly", "daily", "weekly", "monthly", "yearly", or "never"; and <priority>, a floating-point value from 0.0 to 1.0 that suggests relative importance within the site (defaulting to 0.5 if omitted). These components provide hints to crawlers without guaranteeing specific crawling behavior. Sitemap files following this protocol are typically named sitemap.xml and placed at the website's for easy access. They must be encoded in and adhere to XML 1.0 specifications, with a maximum uncompressed size of 50 megabytes (52,428,800 bytes) and no more than URLs per file. Validation against the official at http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd ensures conformance, as demonstrated in this basic example for listing URLs:
xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
   </url>
   <url>
      <loc>http://www.example.com/page1.html</loc>
   </url>
</urlset>
```[](https://www.sitemaps.org/protocol.html)

Unlike HTML sitemaps designed for human navigation, the XML format is machine-readable and optimized exclusively for [search engine](/page/Search_engine) processing, omitting any presentational elements. Detailed specifications for individual elements, such as the precise usage of `<loc>`, are covered in the element definitions section.[](https://www.sitemaps.org/protocol.html)

### Element Definitions

The XML Sitemap [protocol](/page/Protocol) defines a structured set of elements to describe URLs on a [website](/page/Website), enabling search engines to understand the site's content more efficiently. The [root element](/page/Root_element), `<urlset>`, serves as the container for all URL entries in the [file](/page/File) and must include the namespace attribute to reference the [protocol](/page/Protocol) standard. Specifically, it is declared as `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">`, ensuring compliance with the [schema](/page/Schema) for validation. This element encapsulates the entire sitemap and must be the outermost tag, with the [file](/page/File) encoded in [UTF-8](/page/UTF-8) to handle international characters properly.[](https://www.sitemaps.org/protocol.html)

Each individual [URL](/page/URL) is represented by the `<url>` element, which acts as a wrapper for the details of a single page or resource. This element is required for every entry and must contain exactly one child `<loc>` element, though it may also include optional sub-elements like `<lastmod>`, `<changefreq>`, and `<priority>`. The `<url>` tag provides a logical grouping, allowing search engines to parse the [sitemap](/page/Site_map) as a list of discrete entries without ambiguity. Multiple `<url>` elements are nested within the `<urlset>`, forming the core body of the file.[](https://www.sitemaps.org/protocol.html)

The `<loc>` element is the mandatory core of each `<url>` entry, specifying the absolute [URL](/page/URL) of the page being referenced. It must be a fully qualified [URL](/page/URL), starting with a [protocol](/page/Protocol) such as HTTP or [HTTPS](/page/HTTPS), limited to 2048 characters in length, and excluding fragment identifiers (e.g., no "#section" parts). For instance, a valid `<loc>` might be `<loc>https://www.[example.com](/page/Example.com)/products/widget</loc>`, and all values within the sitemap must be entity-escaped, such as replacing "&" with "&amp;". Relative [URLs](/page/URL) are not permitted, as they prevent universal accessibility across [search engine](/page/Search_engine) crawlers.[](https://www.sitemaps.org/protocol.html)

Optionally, the `<lastmod>` element indicates the date and time of the last significant modification to the page, helping search engines prioritize recrawling. It follows the W3C datetime format, such as `<lastmod>2025-11-09T14:30:00+00:00</lastmod>` for a precise [timestamp](/page/Timestamp) or a simpler `<lastmod>2025-11-09</lastmod>` for just the date (YYYY-MM-DD). This value should reflect content changes rather than [metadata](/page/Metadata) updates or [sitemap](/page/Site_map) generation times, and it is distinct from HTTP headers like If-Modified-Since, which search engines may use independently.[](https://www.sitemaps.org/protocol.html)[](http://www.w3.org/TR/NOTE-datetime)

The `<changefreq>` element provides a hint about the expected update frequency of the page, using one of the predefined [enumeration](/page/Enumeration) values: always, hourly, daily, weekly, monthly, yearly, or never. For example, `<changefreq>weekly</changefreq>` suggests moderate changes, guiding crawlers on scheduling but serving only as a non-binding suggestion, as search engines may adjust based on other factors. This element is optional and should be used judiciously to avoid misleading infrequent updates as frequent ones.[](https://www.sitemaps.org/protocol.html)

Similarly optional, the `<priority>` element assigns a relative importance score to the URL within the context of the same [website](/page/Website), expressed as a [decimal](/page/Decimal) value from 0.0 (lowest) to 1.0 (highest), with a default of 0.5 if omitted. An example is `<priority>0.8</priority>`, indicating higher [priority](/page/Priority) than the site average but not implying global ranking influence across different sites. Priorities are site-relative only, and setting all entries to 1.0 negates any useful differentiation.[](https://www.sitemaps.org/protocol.html)

A complete example of a `<url>` entry incorporating all elements for a hypothetical page might appear as follows:
[https](/page/HTTPS)://www.[example.com](/page/Example.com)/products/[widget](/page/Widget) 2025-11-09T14:30:00+00:00 weekly 0.8 ``` This snippet would be nested within a <urlset> for the full file. Common errors in implementing these elements include using invalid date formats in <lastmod>, such as non-W3C compliant strings like "11/09/2025", which may cause search engines to ignore the value; providing relative URLs in <loc>, like "/products/" instead of a full ; or exceeding the 2048-character for <loc>, leading to or rejection of the entry. Additionally, failing to entity-escape special characters or omitting the required <loc> within a <url> can render the sitemap unparseable.

Alternative Formats

Plain Text Sitemaps

Plain text sitemaps provide a basic method for listing website URLs in a non-structured format, consisting of a single text file with one absolute URL per line and no accompanying such as last modification dates, change frequencies, or priorities. These files must use the .txt extension and be encoded in to ensure proper parsing by crawlers. This format is particularly suitable for small websites or legacy systems requiring minimal maintenance, as it avoids the complexity of XML tagging while still enabling basic URL discovery. Both and officially support plain text sitemaps for crawling and indexing purposes, allowing webmasters to notify search engines of site content without advanced features. To create a sitemap, webmasters can use any standard to compile a list of absolute URLs, ensuring the file does not exceed 50,000 URLs or 50 MB in uncompressed size; for larger sites, multiple files can be generated and referenced accordingly. For instance, a simple three-page site might use the following content in its sitemap.txt file:
https://www.example.com/
https://www.example.com/about.html
https://www.example.com/contact.html
This approach emphasizes straightforward compilation, often via manual entry or basic scripting tools. The primary advantage of sitemaps lies in their , enabling quick creation and deployment even in resource-constrained environments without the need for XML validation or specialized generators. However, this format lacks the rich available in the XML sitemap protocol, which limits its ability to guide crawlers on update priorities or frequencies, potentially reducing overall crawl efficiency. Plain text sitemaps pre-date the XML sitemap protocol, which was jointly standardized by , , and in 2006, and were commonly used for early URL submissions to Yahoo's search index.

RSS and Atom Feeds

and feeds, originally designed for , can be adapted to function as sitemaps by search engines when they include elements pointing to site s. This adaptation allows feeds in RSS 2.0 or Atom 0.3/1.0 formats to notify crawlers of available pages, particularly useful for sites already generating such feeds for content distribution. began supporting and feeds as sitemaps in September 2005, enabling publishers to leverage existing infrastructure for improved discoverability. Key requirements for using these feeds as sitemaps include embedding full, absolute URLs to site pages via the <link> in or entries, rather than relying solely on feed item descriptions or relative paths. Additionally, including a modification —such as <pubDate> in or <updated> in —helps search engines prioritize crawling based on recency. Feeds should be placed in the site's to facilitate easy discovery by crawlers, and they must adhere to the respective standards while serving purposes. One primary advantage of RSS and Atom feeds as sitemaps is their ability to provide automatic updates for dynamic content, such as posts or articles, ensuring search engines receive notifications of changes without manual intervention. This dual-purpose functionality benefits both end-users subscribing to content updates and crawlers seeking fresh URLs, making it ideal for frequently updated sites like or portals. However, and feeds have notable limitations when used as sitemaps, as they typically only encompass recent content—often the last 10 to 500 items—rather than an exhaustive list of all site pages. Unlike dedicated XML sitemaps, they lack support for priority levels or change frequency indicators, which can reduce their effectiveness for comprehensive site mapping. For instance, a basic feed adapted for sitemap use might resemble the following snippet, where <link> elements point to full URLs and <pubDate> provides timestamps:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>Example Site</title>
    <link>[https](/page/HTTPS)://www.example.com/</link>
    <description>Site description</description>
    <pubDate>Sat, 01 Jan 2025 00:00:00 GMT</pubDate>
    <item>
      <title>Article Title</title>
      <link>[https](/page/HTTPS)://www.example.com/article1</link>
      <pubDate>Sat, 01 Jan 2025 12:00:00 GMT</pubDate>
      <description>Article summary</description>
    </item>
  </channel>
</rss>
This structure allows discovery of linked pages but does not extend to older or static content. Compatibility varies across search engines, with full support in and , where RSS 2.0 and 0.3/1.0 feeds are processed similarly to XML sitemaps for URL discovery and crawling prioritization. explicitly accepts these formats alongside XML and plain text, treating them as valid sitemap submissions. Other engines may offer partial support, but and feeds are not intended as a complete replacement for full XML sitemaps, especially for large or static sites requiring broad coverage.

Submission and Indexing

Submitting to Search Engines

Sitemaps can be submitted to search engines through two primary methods: automatic discovery by placing the file at the website's root directory or referencing it in the robots.txt file, and direct submission via dedicated webmaster tools. Automatic discovery allows search engine crawlers to locate the sitemap without manual intervention; for instance, adding a line like Sitemap: https://example.com/sitemap.xml to the robots.txt file enables major engines to find and process it during routine crawls. Direct submission provides more control and immediate notification, typically through web-based consoles where site owners verify ownership before adding the sitemap URL. For Google, sitemaps are submitted via by navigating to the Sitemaps section, entering the sitemap URL (or file), and clicking submit; this method is recommended over deprecated alternatives. accepts submissions through under the Sitemaps tool, where users paste the sitemap URL and submit it after site verification. uses its Webmaster Tools, selecting Indexing > Sitemap files to enter and submit the sitemap . These consoles support sitemap files, which consolidate multiple sitemaps into a single reference for easier management of large sites; engines process the index to access individual sitemaps. Sitemaps must be accessible via HTTP or protocols, ensuring crawlers can fetch them without authentication or redirection issues. A notable change occurred with the retirement of Google's sitemap ping endpoint in late 2023, where notifications via http://www.google.com/ping?sitemap=URL ceased to function, shifting emphasis to console submissions and auto-discovery for efficient crawling signals. Tools facilitate submission for non-technical users; for example, the plugin for automatically generates and enables XML sitemaps, integrating submission options directly within the dashboard for seamless delivery to search engines. Online generators like XML-Sitemaps.com allow users to create and download sitemaps, which can then be uploaded to the or submitted manually. Verification of submission occurs through console reports, which display processing status, last access dates, discovered URLs, and any errors such as invalid formats or access issues. For dynamic sites with frequent content updates, such as news platforms, resubmitting the sitemap daily ensures timely crawling of new pages, while static sites may require updates only after significant changes. Multi-engine support follows unified guidelines from sitemaps.org, which outline compatible formats and encourage cross-submission to engines like , , and for broader indexing coverage.

Indexing Limitations

Sitemaps serve as suggestions to search engines about URLs available for crawling and potential indexing, but they do not guarantee that any listed pages will be included in search results. Search engines like evaluate each based on factors such as content quality, duplication, , and adherence to guidelines, often prioritizing high-value pages within limited budgets. For instance, allocates resources based on site size, update frequency, and server performance, meaning even sitemap-submitted URLs may remain unvisited if resources are constrained. Several key constraints can prevent indexing despite sitemap inclusion. Pages marked with a noindex meta tag or HTTP header directive will not be indexed, as this explicitly signals search engines to exclude them from results, overriding any sitemap recommendation. Similarly, resources blocked by robots.txt directives remain inaccessible for crawling, and sitemaps cannot bypass these restrictions—search engines respect disallow rules and will not fetch or index such content. Low-value or thin content, such as duplicate pages or those lacking substantial user benefit, is also frequently ignored, as engines apply policies to maintain result quality. In terms of effectiveness, sitemaps primarily accelerate discovery for new or orphaned pages that lack strong internal or external links, potentially reducing the time to indexing compared to reliance on natural crawling alone. However, for sites with robust linking structures, the impact on overall indexing rates is often minimal, as search engines already efficiently traverse well-connected content. Common pitfalls further limit sitemap utility. Including non-canonical URLs or pages with noindex directives can trigger warnings or rejection of the sitemap file, wasting processing resources and potentially harming crawl efficiency. Over-submission of unchanged sitemaps consumes unnecessary quota in webmaster tools and may dilute focus on truly updated content, indirectly straining crawl budgets. Engine-specific behaviors highlight varying reliance on sitemaps. Bing places greater emphasis on sitemaps for comprehensive in large or deep sites, using them to ensure full coverage amid AI-powered search demands. As of 2025, major engines like have intensified focus on quality over URL quantity, with core updates penalizing low-value content and rewarding signals of authoritative, user-focused pages.

Specifications and Limits

Size and URL Constraints

Sitemaps adhere to strict size and content constraints to ensure efficient processing by crawlers. According to the Sitemaps , each individual sitemap file is limited to a maximum of 50,000 and must not exceed 50 MB (52,428,800 bytes) in uncompressed size. These limits apply to the XML content before any , helping to prevent overload on resources during crawling. Additionally, each specified in the <loc> must be fewer than 2,048 characters in length, and all within a sitemap must belong to the same host as the sitemap file itself. For sites exceeding these per-file limits, the protocol recommends using a sitemap index file, which employs the <sitemapindex> to reference up to 50,000 individual files, each conforming to the standard constraints. The index file itself is also capped at 50 MB uncompressed. indexes must only link to sitemaps on the same site, enabling scalable organization without violating core limits. Major search engines like and enforce these 50,000 URL and 50 MB thresholds strictly to maintain crawling efficiency. Yandex enforces the standard limits of 50,000 URLs and 50 MB uncompressed per sitemap file, recommending the use of sitemap index files for larger sites. To manage large-scale sites within these bounds, sitemaps can be compressed using , which typically reduces file sizes by 60-90% for XML content, aiding efficient transmission, and divided into logical subsets such as dated archives (e.g., sitemap-2025-11.xml) or categorized collections (e.g., sitemap-products.xml). The advises against including redirecting URLs or those with excessive parameters in sitemaps, as they may lead to processing errors, emphasizing , direct links instead.

Best Practices

To create effective sitemaps, automate their generation using content management system (CMS) plugins like Yoast SEO for WordPress or tools such as Screaming Frog for broader sites, ensuring dynamic updates for large inventories without manual intervention. Include only canonical, indexable URLs—such as primary versions of pages with absolute paths like https://www.example.com/product-page.html—while excluding duplicates, redirects, or non-public content to guide crawlers efficiently. Always update the <lastmod> element with precise, verifiable dates in ISO 8601 format (e.g., 2025-11-09) to signal recent changes and prioritize recrawling. For maintenance, resubmit sitemaps to search engines via or robots.txt after significant site updates, such as adding new content or restructuring, to prompt fresh crawling. Regularly monitor for errors in Search Console's Sitemaps report, addressing issues like fetch failures or invalid URLs promptly to maintain crawl efficiency. Avoid including pages marked with noindex directives, as this can confuse crawlers and dilute the sitemap's value. Optimization involves using <priority> and <changefreq> elements judiciously, though ignores them in favor of other signals; reserve higher priorities (e.g., 0.8-1.0) for high-value pages like homepages or key landing pages if targeting engines beyond . Prioritize inclusion of revenue-driving or user-critical pages to focus crawler budget on impactful content. Integrate with markup on individual pages—such as Product or schemas—to enhance rich result eligibility, as sitemaps alone do not embed structured data. In 2025, ensure sitemap compatibility with mobile-first indexing by listing a single preferred URL version (mobile or responsive) per entry, avoiding separate desktop/mobile variants to align with Google's primary rendering focus. Test sitemap URLs using Google Search Console's URL Inspection tool to verify crawlability and indexing status before submission. Track key metrics like indexing rates and error percentages through , aiming to keep error rates below 10% by resolving issues such as malformed XML or inaccessible files, which directly correlates with improved . For sites, create separate for product catalogs to manage large volumes (e.g., one for active , another for images), respecting size limits while highlighting seasonal or high-traffic items. News sites should refresh weekly—or more frequently for breaking content—to include recent articles, ensuring timely indexing without exceeding per- URL caps.

Specialized Types

Image and Video Sitemaps

Image and video sitemaps extend the standard XML sitemap protocol to provide search engines with detailed information about media content on a , facilitating better and indexing of and videos. These extensions use dedicated namespaces and elements that can be embedded directly within the <url> tags of a conventional sitemap or housed in separate files, such as sitemap-images.xml or sitemap-videos.xml. By including media-specific , these sitemaps help prioritize content for rich search features, such as thumbnails and enhanced previews, improving visibility in image and video search results. For images, the extensions are defined in the namespace http://www.google.com/schemas/sitemap-image/1.1. The core structure involves the <image:image> element, which encapsulates details for a single image and can appear multiple times under each <url>. The required <image:loc> element specifies the absolute of the image file itself. Historically, additional elements like <image:title> for a short descriptive title, <image:caption> for contextual text, and <image:geo_location> for latitude and longitude coordinates were supported to enrich image understanding; however, these have been deprecated since August 2022 in favor of simpler structures and alternative best practices like descriptive alt text in . Up to 1,000 <image:image> entries are permitted per <url>, allowing sites with image galleries to associate multiple assets with a single page. The following XML snippet illustrates an embedded image extension for a page featuring a gallery:
xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://example.com/gallery-page.html</loc>
    <lastmod>2025-11-09</lastmod>
    <image:image>
      <image:loc>https://example.com/images/photo1.jpg</image:loc>
    </image:image>
    <image:image>
      <image:loc>https://example.com/images/photo2.jpg</image:loc>
    </image:image>
  </url>
</urlset>
When the deprecated elements were in use, titles and captions were recommended to be concise, ideally under 100 characters, to maintain efficiency in processing. Today, focusing on <image:loc> ensures compatibility while aiding in discovering images that might be loaded dynamically via or hidden from standard crawling. This approach enhances the potential for images to appear as thumbnails in search results, driving more targeted traffic to media-rich pages. Video sitemaps, similarly, leverage the namespace http://www.google.com/schemas/sitemap-video/1.1 and wrap content in the <video:video> element, which supports up to 1,000 instances per <url>. Essential tags include <video:content_loc>, which points to the direct URL of the video file in supported formats like MP4 or WebM; <video:thumbnail_loc> for a representative image preview; <video:title> for a brief, engaging name; <video:description> for a summary of the content; and <video:duration>, specified as an integer value in seconds representing the video's length. These elements provide context that helps search engines evaluate relevance and quality for video-specific queries. An example of a video extension within a standard sitemap entry for a page hosting a tutorial video is shown below:
xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
  <url>
    <loc>[https](/page/HTTPS)://example.com/video-tutorial.html</loc>
    <lastmod>2025-11-09</lastmod>
    <video:video>
      <video:content_loc>[https](/page/HTTPS)://example.com/videos/tutorial.mp4</video:content_loc>
      <video:thumbnail_loc>[https](/page/HTTPS)://example.com/thumbs/tutorial.jpg</video:thumbnail_loc>
      <video:title>Tutorial on [Web Development](/page/Web_development)</video:title>
      <video:description>A beginner's guide to building websites with [HTML](/page/HTML) and CSS.</video:description>
      <video:duration>300</video:duration>
    </video:video>
  </url>
</urlset>
Titles and descriptions should be kept succinct—titles ideally under 100 characters—to optimize for display in search interfaces without truncation. The benefits of video sitemaps are particularly pronounced for , as they enable videos to surface in rich results like video carousels, especially following Google's 2006 acquisition of , which expanded video indexing capabilities across hosted and embedded content. This integration has made explicit video crucial for competing in unified video search ecosystems. Google has provided full support for image sitemaps since April 2010 and video sitemaps since December 2007, allowing webmasters to submit them via tools like for prioritized crawling. offers partial compatibility, accepting standard XML sitemaps that may include these extensions but without dedicated processing for image or video-specific tags, relying instead on general discovery. For optimal results, sites should validate sitemaps against official schemas and monitor indexing status through respective tools.

News Sitemaps

News sitemaps are a specialized extension of the standard XML sitemap protocol designed specifically for news publishers to accelerate the discovery and indexing of timely articles by search engines like . They utilize the namespace http://www.google.com/schemas/sitemap-news/0.9 to incorporate news-specific within each <url> entry, enabling faster crawling of fresh content that meets strict timeliness criteria. This format helps ensure that appears promptly in search results and news aggregators, prioritizing content relevance and recency over general web pages. The core structure of a news sitemap embeds a <news:news> parent element inside each <url> tag, which contains required sub-elements for publication details and article metadata. The <news:publication> element is mandatory and includes <news:name>, specifying the exact publication name as recognized on news.google.com (without parentheses or variations), and <news:language>, using an ISO 639-1 or ISO 639-2 code such as "en" or "zh-cn". Additionally, <news:publication_date> must be provided in W3C datetime format (e.g., "2025-11-09" or "2025-11-09T12:00:00-08:00") to indicate the article's ISO 8601-compliant publication time, while <news:title> captures the article's headline in plain text. Optional elements enhance discoverability, such as <news:keywords> for up to five comma-separated terms relevant to the content (e.g., "election, politics, results"), and <news:geo_targeting> using ISO 3166-1 alpha-2 codes like "US" for location-specific targeting. To qualify for inclusion, news sitemaps must adhere to stringent requirements: articles can only be listed if published within the last 48 hours. Approval in the Publisher Center is recommended for publishers seeking full inclusion in features, where they can verify ownership and manage . Keywords should be limited to fewer than five terms to maintain focus, avoiding overly broad or unrelated phrases. Sitemaps are capped at 1,000 <news:news> entries each, with no support for <priority> or <changefreq> tags, as these are irrelevant for ephemeral ; exceeding limits requires splitting into multiple files via a sitemap index. Publishers are encouraged to update sitemaps hourly or as new articles publish to reflect flows, removing outdated entries promptly. The primary purpose of news sitemaps is to fast-track indexing in , signaling high-priority content for immediate crawling and reducing latency in surfacing breaking stories. They also support (AMP) through the optional <news:amp> tag, which points to a mobile-optimized AMP version of the article , improving load times on devices. For a breaking news article, a representative XML snippet might appear as follows, incorporating keywords and geo-targeting for a U.S. election story:
<url>
  <loc>[https](/page/HTTPS)://example.com/2025-election-results</loc>
  <news:news>
    <news:publication>
      <news:name>Example News</news:name>
      <news:language>en</news:language>
    </news:publication>
    <news:publication_date>2025-11-09T08:00:00-05:00</news:publication_date>
    <news:title>2025 Election: Key Results and Analysis</news:title>
    <news:keywords>[election](/page/Election), results, politics, vote</news:keywords>
    <news:geo_targeting>[US](/page/United_States)</news:geo_targeting>
    <news:amp>[https](/page/HTTPS)://example.com/amp/2025-election-results</news:amp>
  </news:news>
</url>
This example ensures compliance with schema requirements while highlighting timely metadata for efficient indexing.

Advanced Configurations

Multilingual Support

Sitemaps support multilingual websites through the integration of hreflang annotations, which allow webmasters to specify alternate language and regional versions of pages directly within the XML structure. This is achieved by including <xhtml:link> elements as children of each <url> entry, using the rel="alternate" attribute paired with hreflang to indicate the language or locale (e.g., hreflang="en" for English or hreflang="es" for Spanish). These annotations must be bidirectional, meaning each variant page links to all others in the set, including a self-referential link to its own URL. The sitemap namespace must include the XHTML extension: xmlns:xhtml="http://www.w3.org/1999/xhtml". Webmasters can approach multilingual sitemaps in two primary ways: using a single sitemap file that encompasses all language variants or creating separate sitemap files for each language, which are then linked together via a sitemap index file. The single-file method consolidates all <url> entries with their respective <xhtml:link> annotations, making it suitable for smaller sites, while separate files improve organization for larger, language-diverse sites and can reference the index for submission to search engines. Best practices include always adding self-referential hreflang tags (e.g., pointing back to the page's own <loc>), supporting region-specific codes like en-US for American English versus en-GB for British English, and incorporating a default variant with hreflang="x-default" for users whose language or region does not match any specified alternate. Fully qualified absolute URLs should be used in all <loc> and <xhtml:link href> attributes to avoid resolution issues. Key challenges in implementing multilingual sitemaps involve ensuring consistency and avoiding errors that could lead search engines to ignore the annotations. For instance, languages must not be mixed within a single <url> entry; each entry should represent one primary language version with links to alternates. Incorrect language codes (using for languages and Alpha 2 for regions) or missing bidirectional links can invalidate the cluster. Validation is essential and can be performed using tools like Google's URL Inspection tool in Search Console to check if signals are recognized during crawling, or third-party validators such as the Hreflang Tags Testing Tool from TechnicalSEO.com. An example XML snippet for a sitemap entry supporting English and Spanish variants of a page might look like this:
xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <url>
    <loc>https://example.com/en/article/</loc>
    <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/article/" />
    <xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/article/" />
    <xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/article/" />
  </url>
  <url>
    <loc>https://example.com/es/article/</loc>
    <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/article/" />
    <xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/article/" />
    <xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/article/" />
  </url>
</urlset>
This structure ensures all variants are discoverable and properly annotated. Search engines like and utilize these hreflang annotations in sitemaps to deliver results based on the user's language and region preferences, enhancing relevance for international audiences.

Sitemap Indexes

Sitemap indexes enable large-scale websites to organize and reference multiple individual sitemap files, addressing the protocol's constraints on file size and URL count. They serve as a central hub for managing extensive URL inventories, such as those exceeding 50,000 URLs, by linking to category-specific or segmented sitemaps like those for products, blog posts, or images. This approach facilitates efficient crawling and indexing for search engines, particularly on enterprise sites with millions of pages. The structure of a sitemap index file uses an XML root element <sitemapindex> with the namespace http://www.sitemaps.org/schemas/sitemap/0.9, containing one or more <sitemap> child s. Each <sitemap> must include a <loc> specifying the URL of an individual sitemap file, and may optionally include a <lastmod> in W3C datetime format to indicate the last modification date of that sitemap. All files must be encoded, and the referenced sitemaps must belong to the same site as the . This format has been supported since the initial version 0.9. Implementation involves naming the index file conventionally as sitemap_index.xml (or similar, such as sitemap-index.xml) and placing it in the website's for automatic discoverability by search engines, which commonly check for standard locations like /sitemap.xml or /sitemap_index.xml. Sitemaps referenced in the index should reside in the same or a subdirectory relative to the index file to ensure proper . For submission, the index file is provided to search engines, which then process the linked sitemaps. Limits for sitemap indexes include a maximum of 50,000 <sitemap> entries per index file and a total uncompressed file size of 50 MB (or equivalent when gzipped). limits the number of sitemap index files that can be submitted per site to 500 via Search Console. Recursive indexing—where an index links to another index—is permitted by the but supported only to limited depths by major search engines; for instance, processes up to one level of nesting (index to index to sitemaps) but does not recommend structures beyond two levels to avoid processing inefficiencies. The following example illustrates a basic sitemap index file linking to three sub-sitemaps for products, , and images:
xml
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://www.example.com/sitemaps/products.xml</loc>
    <lastmod>2025-11-01</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://www.example.com/sitemaps/blog.xml</loc>
    <lastmod>2025-11-08</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://www.example.com/sitemaps/images.xml</loc>
    <lastmod>2025-11-09</lastmod>
  </sitemap>
</sitemapindex>
This structure simplifies maintenance for large sites by allowing modular updates to individual sitemaps without regenerating a single massive file, improving crawl efficiency and reducing server load during updates.

References

  1. [1]
    Protocol - sitemaps.org
    Nov 21, 2016 · The Sitemap protocol format consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded.
  2. [2]
    What Is a Sitemap | Google Search Central | Documentation
    A sitemap is a file where you provide information about the pages, videos, and other files on your site, and the relationships between them.Missing: protocol | Show results with:protocol
  3. [3]
    Sitemaps ping endpoint is going away | Google Search Central Blog
    The Sitemaps Protocol was introduced in 2005 to help search engines with the discovery of new URLs, and also to help with scheduling new crawls of already ...Missing: history | Show results with:history
  4. [4]
    sitemaps.org - Home
    Apr 17, 2020 · The Sitemaps protocol enables webmasters to information earch engine about pages on their site that are available for crawling.Protocol · Terms of service · FAQMissing: definition | Show results with:definition
  5. [5]
    Crawl Budget Management For Large Sites | Google Search Central
    Learn what crawl budget is and how you can optimize Google's crawling of large and frequently updated websites.
  6. [6]
    Chapter 5: Site Structure - Web Style Guide
    Site structure determines how well sites work in the broader context of the web, and on all the various mobile and desktop screens we use today.
  7. [7]
    Joint support for the Sitemap Protocol | Google Search Central Blog
    Nov 16, 2006 · Introducing Sitemaps for Google News ... We're thrilled to tell you that Yahoo! and Microsoft are joining us in supporting the Sitemap protocol.
  8. [8]
    Introducing Sitemaps for Google News | Google Search Central Blog
    Nov 20, 2006 · The News Sitemaps XML definition lets you specify a publication date and time for each article to help us process fresh articles in timely ...
  9. [9]
    Introducing Video Sitemaps | Google Search Central Blog
    introduce Video Sitemaps—an extension of the Sitemap Protocol that helps make your videos more searchable via Google Video Search ...
  10. [10]
    About - Vanessa Fox
    Before all of that, I worked at Google, where I built Webmaster Central and helped launch sitemaps.org. Now I'm CEO of Keylime Toolbox, software that ...
  11. [11]
  12. [12]
    Build and Submit a Sitemap | Google Search Central | Documentation
    Google supports several sitemap formats. Follow this guide to learn about formats, how to build a sitemap, and how to submit a sitemap to Google.Missing: purpose | Show results with:purpose
  13. [13]
    Sitemaps - Bing Webmaster Tools
    You can submit your sitemaps to your Bing Webmaster Tools account in any of these formats: XML Sitemap; RSS 2.0; Atom 0.3 and 1.0; Text (a plain text file ...
  14. [14]
    Google, Yahoo and Microsoft Agree to Standard Sitemaps Protocol
    Nov 15, 2006 · Google, Yahoo and Microsoft announced tonight that they will all begin using the same Sitemaps protocol to index sites around the web.Missing: joint | Show results with:joint
  15. [15]
    Google Sitemaps Accepts RSS and Atom Feeds - Pete Freitag
    Sep 12, 2005 · Google Sitemaps now accepts RSS 2.0 and Atom 0.3 feeds in addition to their XML protocol, plus text files and OAI-PMH format for better SEO ...
  16. [16]
    Best practices for XML sitemaps and RSS/Atom feeds
    What Crawl Budget Means for Googlebot. 2016. December. Enhancing property ... How is a Google Sitemap different from an HTML sitemap? Using OAI-PMH with ...Best Practices · Important Fields · Xml Sitemaps<|control11|><|separator|>
  17. [17]
    XML Sitemaps – 8 Facts, Tips, and Recommendations for the ...
    Dec 29, 2014 · Xml sitemaps should contain all canonical urls on your site, while RSS/Atom feeds should contain the latest additions or recently updated urls.
  18. [18]
    How Google Interprets the robots.txt Specification | Documentation
    Learn specific details about the different robots.txt file rules and how Google interprets the robots.txt specification.Missing: yahoo | Show results with:yahoo
  19. [19]
    XML sitemaps - Yoast SEO Features
    XML sitemaps in Yoast SEO help make your content available to search engines like Google. Yoast SEO automatically adds one for you. Find out!
  20. [20]
    XML Sitemaps Generator: Create your Google Sitemap Online
    Create an XML sitemap that can be submitted to Google, Bing and other search engines to help them crawl your website better. Create a Text sitemap to have a ...
  21. [21]
  22. [22]
    Block Search Indexing with noindex - Google for Developers
    A noindex tag can block Google from indexing a page so that it won't appear in Search results. Learn how to implement noindex tags with this guide.Implementing noindex · meta> tag · HTTP response header
  23. [23]
    Will an XML sitemap override a robots.txt | SEO Forum - Moz
    Apr 10, 2013 · An XML sitemap shouldn't override robots.txt. If you have Google Webmaster Tools setup, you will see warnings on the sitemaps page that pages being blocked by ...
  24. [24]
    Robots.txt Introduction and Guide | Google Search Central
    A page that's disallowed in robots.txt can still be indexed if linked to from other sites. While Google won't crawl or index the content blocked by a robots.txt ...
  25. [25]
    Your guide to sitemaps: best practices for crawling and indexing
    Nov 29, 2024 · Similarly, you can submit your sitemap to Bing and Yahoo via Bing Webmaster Tools. Yandex, Baidu, and Naver accept sitemap submissions, too.Sitemaps Make It Easier For... · Types Of Sitemaps: Xml Vs... · Use A Sitemap Generator
  26. [26]
    Sitemap Guide 2025: Boost SEO, Improve Site Structure, and Get ...
    May 8, 2025 · Google recommends a limit of 50,000 URLs per sitemap file, but smaller, split sitemaps are easier to manage and audit. Non-canonical or noindex ...Missing: quantity | Show results with:quantity
  27. [27]
    Keeping Content Discoverable with Sitemaps in AI Powered Search...
    Jul 31, 2025 · Bing recommends submitting XML sitemaps following the Sitemap specification to help ensure full site discovery and indexing coverage.Missing: studies | Show results with:studies
  28. [28]
    SEO in 2025: What Works and What Doesn't – World Reach
    The January 2025 core update brings stricter quality standards and better algorithms to detect AI-generated content. Google's algorithms now give more weight to ...
  29. [29]
    Manage Your Sitemaps With Sitemap Index Files | Documentation
    Once you've split up your sitemap, you can use a sitemap index file as a way to submit many sitemaps at once.Missing: purpose | Show results with:purpose
  30. [30]
    Increasing the size limit of Sitemaps file to address evolving...
    Nov 30, 2016 · To address these evolving needs, we are happy to announce that we are increasing sitemaps file and index size from 10 MB to 50MB (52,428,800 ...
  31. [31]
    Using the Sitemap file | Webmaster - Yandex
    The maximum size of the uncompressed file is 50 MB. Specify only page links of the domain where the file will be located. Place the file on the same domain as ...
  32. [32]
    Image Sitemaps | Google Search Central | Documentation
    Image sitemaps help Google discover images. Follow these best practices on how to create an image sitemap and review image sitemap examples.
  33. [33]
    Video Sitemaps and Examples | Google Search Central
    A video sitemap is a sitemap with additional information about videos hosted on your pages. Creating a video sitemap is a good way to help Google find and ...Video Sitemap Reference · Sitemap Alternative: Mrss · Mrss Reference
  34. [34]
    Spring cleaning: some sitemap extension tags are going away
    May 6, 2022 · By simplifying sitemap extensions, we hope you can also reduce complexity of your codebases, and sitemaps will be less cluttered in general. If ...Missing: purpose | Show results with:purpose<|control11|><|separator|>
  35. [35]
    Image SEO Best Practices | Google Search Central | Documentation
    Jul 11, 2025 · Google Search helps users visually discover information on the web. Explore image SEO best practices such as image captions and badges.<|control11|><|separator|>
  36. [36]
    Video SEO Best Practices | Google Search Central | Documentation
    Explore video best practices, including video structured data, to help increase the likelihood that your video will be displayed in search results.
  37. [37]
    Adding Images to your Sitemaps | Google Search Central Blog
    You can add additional information about important images that exist on that page. You don't need to create a new Sitemap, you can just add information on ...
  38. [38]
    Create a News Sitemap | Google Search Central | Documentation
    If you are a news publisher, use news sitemaps to tell Google about your news articles and additional information about them.Missing: protocol timeline 2007
  39. [39]
  40. [40]
    Google Publisher Center
    Sign in. Use your Google Account. Email or phone. Forgot email? CAPTCHA image of text used to distinguish humans from robots. Type the text you ...Missing: domains | Show results with:domains
  41. [41]
    Localized Versions of your Pages | Google Search Central
    Learn how you can use a sitemap and other methods to tell Google about all of the different language and regional versions of your pages.Missing: purpose | Show results with:purpose
  42. [42]
    Multilingual and multinational site annotations in Sitemaps
    The full technical details of how the annotations are implemented in sitemaps, including how to implement the xhtml namespace for the link tag, are in our new ...<|separator|>
  43. [43]
    hreflang Tags Testing Tool - TechnicalSEO.com
    This tool allows you to quickly check if hreflang tags for a page (HTML and HTTP headers), or in XML Sitemaps, are correct.
  44. [44]
    Going international: considerations for your global website
    Feb 21, 2009 · Being global is made easy on the Internet, ensuring that the content you produce will be found by the right audience can be a real challenge.Some Best Practices · Common Mistakes · Using A Cookie To Store The...