You’ve created a sitemap, submitted it to Google, but unexpectedly your sitemap status is not a Success? Or maybe your Sitemap report looks good, but you decided to also check out the Index Coverage report, and it seems like Google ignores your polite request and won’t index many of the pages from your sitemap. So, now you wonder if there’s something you could do to improve your indexing stats. In any case, don’t look any further, because you’ll find all your answers in this post.
If you don’t yet have a sitemap yet and want to learn what makes a good sitemap, take a look at our introductory sitemapping crash course. There, you’ll learn about the benefits of having a sitemap and sitemap best practices. Besides, you should consult the guide if you don’t yet know what <loc> and <lastmod> tags are used for or what is video sitemap or sitemap index file.
The first part of this post lists all the errors you may encounter in your GSC Sitemap report. So, if you’re looking for a way to fix some issues, use a table of contents to navigate to the errors you are interested in.
The second part features insights that would help you get the most out of your sitemap—you’ll learn how to find trash pages on your sitemap, where to look for pages you may have failed to include to your sitemap file and how to encourage Google to index more of your sitemap pages. Thus, I highly recommend everyone to carefully study the second chapter of this post.
Fixing Sitemap report errors
Once you submit your sitemap to Google, you’ll see whether it has managed to process the file in the Status column. If your file follows all the rules, the status should be Success. In this chapter, we’ll discuss two other status codes, namely Couldn’t fetch and Has errors.
Google has issues crawling your sitemap file
Let’s start with the most adverse scenario when Google couldn’t fetch your sitemap file—in this case, you’ll have to use the URL Inspection tool to find out what could be causing the problem.
In the URL Inspection tool, click Live test button and check the Page fetch status. If it says Successful, there must be a bug on Google’s side.
If your sitemap indeed can’t be fetched, make sure nothing is blocking Google from accessing your sitemap, be it robots.txt directives or CMS plugins (yes, sometimes they are to blame!) Also, make sure you’ve entered a proper sitemap URL—pay attention to protocol and www.
A couldn’t fetch error may also occur if you submitted a sitemap index file and not a single sitemap to GSC—the problem should be resolved in the same way as with a single sitemap.
Sitemap index file errors
Now, let’s move on to cases, when Google fetched your submitted file and detected some errors.
With a sitemap index file, Google needs to further process all separate sitemaps you listed to be able to finally get to your website URLs. If Google fails to process URLs listed on the sitemap index file, you’ll get an Invalid URL in sitemap index file error. It normally means that Google can’t find one or several of your sitemaps at designated locations because you used incomplete URLs. All the URLs pointing to individual sitemaps in your sitemap index file should be fully-qualified—otherwise, Google may fail to find them.
Besides, your sitemap index file shouldn’t list other sitemap index files, only sitemaps. If you do so, you’ll get an Incorrect sitemap index format: Nested sitemap indexes error.
The last error is Too many sitemaps in sitemap index file. It may occur with huge websites that listed more than 50,000 sitemaps in a single file.
Sitemap size and compression errors
Size restrictions apply both to sitemap index files and individual sitemaps. Sitemap file size shouldn’t exceed 50MB when uncompressed and the file shouldn’t list more than 50 000 URLs. If you fail to adhere to these recommendations, you’ll get a Sitemap file size error. You can learn more about splitting sitemaps into several files from our ultimate sitemap guide.
Now, while your sitemap shouldn’t be huge, it naturally shouldn’t be empty as well. If you happen to submit an Empty sitemap you’ll get a respective error.
Also, I mentioned that sitemap size should be less than 50MB when uncompressed, but it is a common practice to compress sitemaps to save bandwidth. A commonly used tool for this purpose is gzip that adds a gz extension to a sitemap. If you get a compression error in the GSC report, it means something went wrong during the compression process, and you should do it once again.
Google has issues crawling your sitemap URLs
For a number of reasons, Google may not be able to crawl some of the URLs you listed on your sitemap. Let’s take a look at all such errors.
Sitemap contains URLs which are blocked by robots.txt error is a pretty clear one since GSC will point you to the URLs that are blocked. Depending on whether you want to have these URLs indexed you’ll have to either lift the block or remove them from your sitemap.
Other errors such as URLs not accessible, URLs not followed, URL not allowed are not that obvious. Let’s briefly go through each of them.
URLs not accessible error means that Google has found your sitemap at the designated location but couldn’t fetch some of the URLs on your list. In this case, you once again need to use the URL Inspection tool just like when Google can’t fetch your sitemap at all.
URLs not followed error occurs either because you used relative URLs on your sitemap instead of fully-qualified URLs or because of redirect issues. Redirect chains and loops, temporary redirects used instead of permanent redirection, HTML and JS redirects can all lead to these errors.
Google Search Console does not specify what exactly is causing the problem, so you’ll have to use other tools to understand which issues need to be fixed. For example, SE Ranking’s Website Audit tool has a dedicated Redirects section, where you can check if your website has any redirect problems.
If the tool finds some issues, you’ll be able to get all the necessary information on every error by clicking on the number of pages—you’ll know which page features an error and how this page is linked to other pages of the website.
URL not allowed error indicates that your sitemap features URLs at a higher level or different domain than the sitemap file. For example, if your sitemap is located at yoursite.com/category1/sitemap.xml. and you’ve added a page to it that is located at yoursite.com/page1, Google won’t be able to access that page.
Speaking of different domains, beware that Google treats HTTP and HTTPS as well as www and non-www versions of your site as different entities. So, if you recently switched to HTTPS, make sure to generate a new sitemap with HTTPS URLs.
SE Ranking’s Website Audit tool will warn you about such instances as well.
Finally,there’s one more thing that won’t let Google crawl a page—a non-200 status code. It is called HTTP error in GSC report, and the exact code is specified for each separate instance. Besides, you can find all the necessary information in the HTTP section of SE Ranking’s Website Audit.
Google suspects you’ve listed the wrong URLs
Speaking of www, if you happen to add non-www URLs to your www-sitemap, you’ll get a Path mismatch error. The same goes for non-www-sitemap featuring www-URLs. Even if your website is available both with and without www, you shouldn’t mix things up in your sitemap. If your sitemap is located at https://example.com/sitemap.xml, none of the URLs it features should include www. If your sitemap is located at https://www.example.com/sitemap.xml, all of the URLs it lists should include www.
Syntax-based sitemap errors
Now, in most cases you don’t need to worry about syntax-based sitemap error, because if you generate a sitemap with one of the special tools, they shouldn’t mix things up with tags and attributes. However, if you have a custom sitemap that was created manually, you may encounter one of the following issues:
- Invalid tag value. Tag value is what you put between the opening and closing tag—URL between the <loc> tags, the date you indicate with the help of <lastmod> tag. An error occurs when you put an unacceptable value in your sitemap, e.g. when you set a priority that is out of the 0.0 to 1.0 range. Not that you need to set a priority value at all!
- Invalid attribute value. The attribute’s value is what you indicate after an equals sign (=) within quotation marks. The following string of code lists different language versions of a page in a sitemap
<url><loc>https://example.com</loc><xhtml:link rel=”alternate” hreflang=”gb” href=”https://example.com”/><xhtml:link rel=”alternate” hreflang=”fr” href=”https://example.com/fr”/></url>
Here, “alternate”, “gb” and “fr” are attribute values and “gb” is the wrong one since you cannot indicate just the country code in hreflangs—it should be paired with a language code as in “en-gb.”
- Invalid URL. Like you should have guessed, this error means you need to be looking for typos in your listed URLs. Let me remind you here that all the URLs in your sitemap should be fully-qualified.
- Invalid date. This one is pretty straightforward—it means you used the wrong date format for the <lastmod> tag. The only acceptable format is the following:
- Missing XML attribute and Missing XML tag errors are rather clear as well. Leaving out mandatory tags and attributes (urlset, url, loc, xmlns) is not an option—you need to list them to let your sitemap function properly.
- Invalid XML: too many tags. This error would occur if you use one of the tags multiple times, e.g. you list two different URL locations or two different modification dates for a single URL. Thus, you’ll have to remove the duplicate tag.
<url> <loc>http://www.example.com/</loc> <lastmod>2021-01-01</lastmod> <lastmod>2021-02-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url>
- Incorrect namespace. The namespace listed within your <urlset> tag should be one of the accepted protocols. Currently, the following protocol are used:
For regular sitemaps — xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″
For News sitemaps — xmlns:news=”http://www.google.com/schemas/sitemap-news/0.9″
For Video sitemaps— xmlns:video=”http://www.google.com/schemas/sitemap-video/1.1″
For Image sitemaps — xmlns:image:=”http://www.google.com/schemas/sitemap-image/1.1″
- If you used the wrong protocol for your sitemap, you’ll get the Unsupported format error. The error can also occur because of all sorts of other syntax errors such as using wrong quotation marks (only straight single or double quotes are accepted) or missing the encoding tag.
There are also several video-sitemap-specific errors: Thumbnail too large/small, Video location and play page location are the same, Video location URL appears to be a play page URL. You can find more details on these errors here.
To avoid syntax errors, use one of the sitemap validators like this one before submitting a sitemap—the tools will highlight issues that need to be fixed.
Once you fix all the sitemap errors mentioned in your GSC report, resubmit your updated sitemap. It will encourage Google to recrawl your website and finally index pages that it couldn’t crawl because of the errors.
Evening the submitted URLs vs indexed URLs ratio
Your sitemap or sitemap index file status may be Success, but that doesn’t mean you’re done with your sitemap. Click the index coverage icon next to the number of discovered URLs to go to the respective report. Once you start investigating it, you may notice that not all the pages you submitted were indexed.
Now, it’s ok to have pages excluded from indexing—Google cannot like and index all the pages of your website it is aware of. Moreover, almost every website has pages webmasters don’t want to have indexed—admin areas, utility pages, duplicate and alternative pages. What is not ok is having Errors and Valid with warnings issues in your Index Coverage report. Also, it’s not okay when the number of pages excluded from indexing is manyfolds higher than the number of valid pages.
So why Google may not index your pages submitted for indexing? In most cases, that would happen when you add pages that shouldn’t be on your sitemap. Maybe Google just can’t index and crawl the page because of a noindex directive. Besides, it may be confused whether you actually want to have a page indexed or not, like when you add non-canonical pages to your sitemap. All such instances may be found in different tabs of GSC Index Coverage report, but it’s more convenient to check them using SE Ranking’s Website Audit tool—if your website has any issues of the sort, you’ll find them in the Crawling section of the Issue report.
Remove noindex and non-canonical pages from your sitemap or if the pages were marked as noindex and non-canonical by mistake, fix the wrong tags issues.
Once you’re sure your sitemap is not sending confusing signals to Google, go through the Index Coverage report to find cases where you and Google disagree on the value of a page.
- In the Valid with warnings tab, pay attention to pages that were indexed in spite of the noindex directive—chances are Google was right, and you need to remove the noindex tag from these pages or your X-Robots tag.
- Finally, go to the Excluded tab. Most pages here should be excluded from indexing according to your own directives, e.g. old 404 pages, pages blocked by robots.txt, noindex and canonicalized pages. Pay attention to canonical pages Google decided not to index, because the search engine thinks there are better alternatives on your website. Scrupulously study every single case and decide if a page really is more valuable than its duplicates—fix your canonical tags if Google was right. If you still think the page should be indexed, you’ll have to work on its content, backlinks profile and internal linking to convince Google that it is more worthy than others.
The Excluded tab features two more interesting page categories: Crawled – currently not indexed and Discovered – currently not indexed. Both types normally label low-quality pages with thin content that Google doesn’t want to show to users. In the first case, the page was at least crawled and then considered low-quality and in the second case, the search engine didn’t even bother to spend the crawl budget on the page. Take a closer look at all such pages and see what you can do to increase their value—work on content, user experience, internal linking, etc.
Thanks to the variety of sitemap generating tools, creating a sitemap is easy. However, if you just use one of the random tools and ignore site-mapping best practices, you may end up with a Sitemap report full of errors or submit loads of low-quality pages to Google via your sitemap.
I hope that this guide helped you to fix every single error on your GSC sitemap report, and you will also manage to only keep juicy high-quality pages on your sitemap and drop all the pages that make a bad impression on search engines. If you have any questions left, don’t hesitate to leave them in the comments section below.