When you are not sure about directions, your best bet is to look at a certain place on the map and figure out how to get there using geo navigation. The same applies to search robots – when they are scanning your site, they start with a sitemap. So if you want your content to be explored, crawled and indexed, it’s better to have a sitemap on your website.
To keep it simple – a sitemap is a file with a list of all the web pages accessible to crawlers or users. It may look like a book’s table of contents, except the sections are the links. A sitemap can be either a document in any form used as a planning tool for web design, or a web page that lists the pages of a website, typically organized in a hierarchical order.
There are 2 main types of sitemaps: HTML sitemap and XML sitemap. An HTML sitemap is a web page that lists links. Usually, these are links to the most important sections and pages of the website. The HTML sitemap is designed mainly for people and not robots, and helps quickly navigate across the main sections of the site.
An XML sitemap is an XML file (e.g. sitemap.xml) that’s located in the website root folder. The sitemap is created in the XML format and allows to specify up to 50,000 links as well as scanning priority and frequency.
Let’s review all the benefits of having a sitemap, find out how to satisfy your needs and comply with search engine rules.
What are the benefits of having an XML sitemap
As you’ve already found out, a sitemap is a file with information about the pages that need to be indexed. So it’s definitely the right place where you should list the web pages of your site to tell search engines about your content and help them find exactly what they need. Moreover, it can provide valuable page metadata: last page update, update frequency, and page importance level in your content hierarchy.
As Google says, you can benefit from adding a sitemap to your website and never get penalized for having one. That’s already a good enough reason to create one, right? But that’s not it – here are more benefits of having an XML sitemap on your website:
- XML sitemaps help search engines understand what you would like to index on your website and prioritize the crawling process (indicating the most and less important pages in the crawling order which is especially important for bulky websites).
- A sitemap can help your website recover if its web pages were hit by the Google Panda update (especially useful for large websites).
- If your site has a deep directory structure, a sitemap will act like a guide for search engines so they don’t miss valuable content.
- If it’s a brand new site, adding a sitemap will be a good way to let search robots (and thus the whole world) know about it and index it accordingly.
- Sitemaps help you control indexing of certain pages in Google Search Console.
- An XML sitemap is your legal helper in confirming your content rights as it mentions the page publication and update time.
How many sitemaps do you need and how to create the right one
This practical part of our sitemapping crash course is extremely important – read carefully!
Before creating a sitemap, you may wonder how many sitemaps you need. Usually, one is enough. But if you have a file larger than 10 MB, more than 50,000 URLs or want to track your web pages in Search Console separately, it’s better to break it into several sitemaps. Also, if you have subdomains, you will need separate sitemaps simply because you can’t include subdomain pages in the root sitemap.
To submit all of your sitemaps to Google at once, you can create an index file with all the sitemaps.
What is a sitemap index file?
A sitemap index file is a file that contains many sitemaps and helps in handling them. It serves as a directory that provides search engines with information about your website pages in the XML format. Note: a sitemap index file can’t list other sitemap index files – it lists only sitemap files.
There are 3 ways to create a sitemap for a website:
- Manually (requires some time and skills).
- Using special WP plugins for creating sitemaps like Google XML Sitemap.
- Using a free sitemap generator like this one (usually all the free options have a limited number of pages); a paid sitemap generator; or something that I call a freemium sitemap generator with an unlimited number of pages that’s available as part of the SE Ranking Website Audit tool (free with the 2-week trial and then available via a subscription plan).
Sitemap XML tags and their settings
The sitemap protocol format consists of XML tags. If you’ve decided to create a sitemap manually or fill in the settings yourself, you should know how to set these tags correctly.
Here are some of the most common XML tags:
- <sitemapindex> – a parent tag at the beginning and end of the file;
- <sitemap> – a parent tag for each sitemap file. At the same time, this tag is a child tag relative to the sitemap index tag;
- <url> – a block that contains the URL and other elements;
- <loc> – the page URL itself;
- <changefreq> – how often this page can change;
- <priority> – the priority of structural elements (this helps to determine which pages have higher crawling priority);
- <lastmod> – the last time the page content was updated.
Tip: Make sure that you use the same syntax when specifying a URL. Also, sitemap files should be UTF-8 encoded.
How to set sitemap priority and frequency
To prioritize page crawling, you should set the priority and frequency for search engine crawlers. Thus, they will understand what content you consider more important over the other. You know those XML tags already – the <priority> and <changefreq> tags.
To set the priority, use the <priority> tag, which specifies the priority of one URL in relation to the other URLs on your site (valid values range from 0.0 to 1.0).
To set the frequency, use the <changefreq> tag, which indicates how frequently the page is likely to be updated: always, hourly, daily, weekly, monthly, yearly, never.
For the main page, it’s recommended using the “1.0” priority (that is 100%), and the “daily” frequency.
When it comes to the main sections or promoted pages, use the “0.8-0.6” priority and the “weekly” frequency. For lower priority pages (for example, forum pages), use the “0.6-0.4” priority and the “weekly” or “monthly” frequency.
Tip: Don’t set equal priority for all pages, since it doesn’t look like data to crawlers. Keep in mind that the priority is relative, it shows the page importance in relation to the other pages of your site.
How to validate a sitemap and place it on your website
This is the final part of our short sitemapping course. After you’ve generated your sitemap, check its validity. For this, add the sitemap file to your website. Use an FTP client, for example, Total Commander or FileZilla. After putting the sitemap file into your website root folder, submit it to Google Search Console (GSC). Make sure that your sitemap does not have pages that redirect or pages that are not selected as the main ones in the canonical tag. In addition, the URLs should not be duplicated.
If the sitemap report contains errors, please correct them first and then submit an adjusted sitemap to GSC.
Then, add the reference to your sitemap to your Robots.txt file, so that crawlers can find the sitemap of your website.
Tip: If you disallow some of the pages from being indexed via the robots.txt or use the “noindex” meta tag, don’t include these pages in the sitemap file.
Congrats! You’ve finished our crash course on SEO sitemapping. On a final note, I want to leave you with the strongest piece of advice I hold – if you want the journey through your site to be valuable both to search crawlers and users, provide them with a correct sitemap – you definitely should know how to do it after this mini-course. There is no magic – just follow the recommendations above. Of course, it doesn’t guarantee indexing by the search engines, but it does highly increase your chances!
Have I missed something important about sitemaps? Please let me know in the comments, and I’ll add them in!