Kelly Breland
Jan 12, 2021 | 17 min read

The robots meta tag and the x-robots tag are used to instruct crawlers how to index pages of a website. The former is indicated in the HTML code of a web page, while the latter is included in the HTTP header of a URL.

The process of indexing goes through several steps: the content is loaded, analyzed by search engine robots, and added to the database. Information that has made it to the index is what is being shown in the SERPs. 

In our post about the robots.txt file, we discussed how to allow bots to crawl a website and how to prevent them from crawling certain content. In this article, we’ll explore how to gain control over how web pages are indexed, what content to close from indexing and how to do it correctly. 

SEO benefits of using robots and X-Robots-Tag

Let’s examine how the robots meta tag and the X-Robots-Tag help in search engine optimization and when you should use them.

1. Choosing what pages to index

Not all website pages can attract organic visitors. If indexed, some of them might actually harm the site’s search visibility. These are the types of pages that are usually blocked from indexing with the help of noindex:

  • duplicated pages
  • sorting options and filters
  • search and pagination pages
  • technical pages
  • service notifications (about a sign up process, completed order, etc.)
  • landing pages designed for testing ideas
  • pages that are in progress of development 
  • information that isn’t up-to-date yet (future deals, announcements, etc.)
  • outdated pages that don’t bring any traffic
  • pages you need to block from certain search crawlers

2. Managing how certain file types are indexed

You can prevent robots from crawling not only HTML pages but also other types of content like an image URL or .pdf file. 

3. Keeping the link juice

Blocking links from crawlers with the help of nofollow, you can keep the page’s link juice because it won’t be passed to other sources though external or internal links.

4. Optimizing the crawl budget

The bigger a site is, the more important it is to direct crawlers to the most valuable pages. If search engines crawl a website inside and out, the crawl budget will simply end before bots reach the content helpful for users and SEO. This way, important pages won’t get indexed or will get to the index behind the desired schedule.

The directives of robots and X-Robots-Tag

The Robots and X-Robots-Tag differ in their basic syntax and utilization. The robots meta tag is inserted into the HTML code of a web page and has two important attributes: name (for indicating the search robot’s name) and content (commands for the search robot). The X-Robots-Tag is added to the configuration file and doesn’t have any attributes.

Telling Google not to index your content with the help of robots looks like this:

<meta name="googlebot" content="noindex" />

If you choose to prevent Google from indexing your content using the x-robots tag, it will look like this:

X-Robots-Tag:googlebot: noindex, nofollow

The Robots and X-Robots-Tag have the same directives used for giving search bots different instructions. Let’s review them in detail.

Robots and X-Robots-Tag directives: functions and browser support

DirectiveIts functionGOOGLEYANDEXBINGYAHOO!
index/noindexTells to index / not index a page. Used for pages that are not supposed to be shown in the SERPs.++++
follow/nofollow Tells to follow / not follow the links on a page.++++
archive/noarchiveTells to show / not show a cached version of a web page in search.++++
all/none All is the equivalent of index, follow used for indexing text and links. None is the equivalent of noindex, nofollow used for blocking indexing of text and links.+++
nosnippetTells not to show a snippet or video in the SERPs.++
max-snippetLimits the maximum snippet size. Indicated as max-snippet:[number] where number is a number of characters in a snippet.++
max-image-previewLimits the maximum size for images shown in search. Indicated as max-image-preview:[setting] where setting can have none, standard, or large value.++
max-video-previewLimits the maximum length of videos shown in search (in seconds). It also allows setting a static image (0) or lifting any restrictions (-1). Indicated as max-video-preview:[value].++
notranslatePrevents search engines from translating a page in the search results.+
noimageindexPrevents images on a page from being indexed.+
unavailable_afterTells not to show a page in search after a specified date. Indicated as unavailable_after: [date/time].+

All of the abovementioned directives can be used with both the robots meta tag and x-robots tag for Google bots to understand your instructions.

Note that indexing a site’s content that is not hidden from search engines is done by default so you don’t have to indicate index and follow directives.

Conflicting directives

If combined, some directives may cause conflicts, for example, permitting to index and at the same time preventing the same content from indexing. Google will choose the restrictive instruction over the permissive one.

Directive combinationGoogle’s actions
<meta name=”robots” content=”noindex, index”/>The robot will choose noindex and the page text won’t be indexed.
<meta name=”robots” content=”all”/><meta name=”robots” content=”noindex, follow”/>The robot will choose noindex and the page text won’t be indexed but it will follow and crawl the links.
<meta name=”robots” content=”all”/><meta name=”robots” content=”noarchive”/>All instructions will be considered: text and links will be indexed while links leading to a page’s copy won’t be indexed.

The robots meta tag: syntax and utilization

As we’ve said, the robots meta tag is inserted into the page HTML code and contains information for search bots. It’s placed in the <head> section of the HTML document and has two obligatory attributes: name and content. Simplified, it looks like this:

<meta name="robots" content="noindex" />

The name attribute

This attribute defines the meta tag type according to the information it gives to search engines. For instance, meta name=”description” sets a short description of a page to be displayed in the SERPs, meta name=”viewport” is used for optimizing a site for mobile devices, meta http-equiv=”Content-Type” defines a type of document and its encoding. 

In meta name=”robots”, the name attribute specifies the name of the bot the instructions are designed for. It works similarly to the User-agent directive in robots.txt that identifies the search engine crawler.

The “robots” value is used to address all search engines, while if you have to set the instructions particularly for Google, you’ll write meta name=”googlebot”. For several crawlers, you’ll need to create separate tags.

The content attribute

This attribute contains instructions for indexing the page content and its display in the search results. The directives explained in the table above are used in the content attribute.

Note that:

  • Both attributes are not case-sensitive.
  • If attribute values aren’t included or written correctly, the search bot will ignore the blocking instruction.
  • When addressing several crawlers, you need to use a separate robots meta tag for each. As for the content attribute, you can indicate its different directives in a single meta tag, comma separated.

The robots.txt file and the robots meta tag

Given the fact that search robots first look at the robots.txt file for crawling recommendations, they won’t be able to crawl a page and see the instructions included in the code if the page is closed in robots.txt.

If a page has the noindex attribute but is blocked in the robots.txt file, it can be indexed and shown in the search results—for example, if the crawler finds it by following a backlink from another source. Since robots.txt is generally accessible, you can’t be sure that crawlers won’t find your “hidden” pages.

With that said, if you close a page with the help of the robots meta tag, make sure there’s nothing in the robots.txt file preventing it from being crawled. When it comes to blocking images from indexing, sometimes it does make sense to use robots.txt.

Using the robots meta tag

  • Method 1: in an HTML editor

Managing pages is similar to editing a text file. You have to open the HTML document in an editor, add robots to the <head> section, and save.

Pages are stored in the site’s root catalog which you can reach using your personal account from a hosting provider or an FTP. Save the source document before making changes to it.

  • Method 2: using a CMS

It’s easier to block a page from indexing using a CMS. There are a number of plugins, for example, Yoast SEO for WordPress, that allow you to block indexing or crawling links when editing a page.

Robots meta tag in Yoast SEO plugin for WordPress
Source: Yoast

Verifying the robots meta tag

It takes time for search engines to index or deindex a page. To make sure your page isn’t indexed, use services for webmasters or browser plugins that check meta tags (for example, SEO META in 1 CLICK for Chrome).

You can also check if the page is indexed using Google Search Console:

Checking page indexing in Google Search Console

If a page check shows that the robots meta tag doesn’t work, verify if the URL isn’t blocked in the robots.txt file, checking it in the address bar or using Google’s robots.txt tester.

SE Ranking also allows you to check what website pages are in the index. To do so, go to the Index Status Checker tool.

Index status check in SE Ranking

X-Robots-Tag: syntax and utilization

X-Robots-Tag is a part of the HTTP response for a given URL added to the configuration file. It acts similarly to the robots meta tag and impacts how pages are indexed but sometimes, you should use x-robots specifically for indexing instructions. 

A simple example of the X-Robots-Tag is as follows:

X-Robots-Tag: noindex, nofollow

When you need to set rules for a page or file type, the X-Robots-Tag looks like this:

<FilesMatch "filename">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>

The <FilesMatch> directive searches for files on the website using regular expressions. If you use Nginx instead of Apache, this directive is replaced with location:

location = filename {
  add_header X-Robots-Tag "noindex, nofollow";
}

If the bot name is not specified, directives are automatically used for all crawlers. If a particular robot is identified, the tag looks like this:

Header set X-Robots-Tag "googlebot: noindex, nofollow"

When you should use X-Robots-Tag

  • Deindexing non-HTML files

Since not all pages have the HTML format and <head> section, some website content can’t be blocked from indexing with the help of the robots meta tag. This is when x-robots comes in handy. 

For example, when you need to block .pdf documents:

<FilesMatch "\.pdf$">
    Header set X-Robots-Tag "noindex"
</FilesMatch>
  • Saving the crawl budget

With the robots meta tag, the crawler loads a page and then reads the directives, while x-robots gives indexing instructions before the search bot gets to a page. In the latter situation, search engines don’t spend time crawling the pages and keep the crawl budget to use it for more important content. It’s especially helpful to use the X-Robots-Tag for large-scale websites.

  • Setting crawling directives for the whole website

Using the X-Robots-Tag in HTTP responses allows you to set the directives and manage how your content is indexed on the level of your website and not separate pages.

  • Addressing local search engines

The biggest search engines understand the majority of restrictive directives, while small local search engines may not know how to read indexing instructions in the HTTP header. If your website targets a specific region, learn about local search engines and their characteristics.

The major function of the robots meta tag is to hide pages or some content elements from the SERPs. The X-Robots-Tag allows you to set more general instructions for the whole website and inform search bots before they crawl web pages, saving the crawl budget.

How to apply X-Robots-Tag

To add the X-Robots-Tag header, you should use configuration files in the website’s root directory. The settings will differ depending on the web server.

Apache

You should edit server documents .htaccess and httpd.conf. If you need to prevent all .png and .gif files from indexing in the Apache web server, you should add the following:

<Files ~ "\.(png|gif)$">
 Header set X-Robots-Tag "noindex"
</Files>

Nginx

You should edit the configuration file conf. If you need to prevent all .png and .gif files from indexing in the Nginx web server, you should add the following:

location ~* \.(png|gif)$ {
 add_header X-Robots-Tag "noindex";
}

Important: before editing the configuration file, save the source file to eliminate website performance issues in case there are some errors. 

How to check X-Robots-Tag 

There are several ways to learn what response the HTTP page header gives and whether it contains the X-Robots-Tag: online URL checking services, browser extensions, and webmaster tools.

For instance, the HTTP header that blocks indexing looks like this:

HTTP/1.1 200 OK
Date: Tue, 10 November 2020 09:30:22 GMT
X-Robots-Tag: noindex 

Checking x-robots in Google

To check the tag using Google Search Console, go to URL Inspection, and click on Test live URL and View crawled page. You’ll see the information about the HTTP response in the More info section.

Checking X-Robots-Tag in Google Search Console

Examples of the robots meta tag and the X-Robots-Tag

noindex

Telling all crawlers not to index text on a page and not to follow the links:

<meta name="robots" content=" noindex, nofollow" />
X-Robots-Tag: noindex, nofollow

nofollow

Telling Google not to follow the links on a page:

<meta name="googlebot" content="nofollow" />
X-Robots-Tag: googlebot: nofollow

noarchive

Telling search engines not to cache a page:

<meta name="robots" content="noarchive"/>
X-Robots-Tag: noarchive

none

Telling Google not to index and follow the links in an HTML document:

<meta name="googlebot" content="none" />
X-Robots-Tag: googlebot: none

nosnippet

Telling search engines not to display snippets for a page:

<meta name="robots" content="nosnippet">
X-Robots-Tag: nosnippet

max-snippet

Limiting the snippet to 35 symbols maximum:

<meta name="robots" content="max-snippet:35">
X-Robots-Tag: max-snippet:35

max-image-preview

Telling to show large image versions in the search results:

<meta name="robots" content="max-image-preview:large">
X-Robots-Tag: max-image-preview:large

max-video-preview

Telling to show videos without length limitations: 

<meta name="robots" content="max-video-preview:-1">
X-Robots-Tag: max-video-preview:-1

notranslate

Telling search engines not to translate a page:

<meta name="robots" content="notranslate" />
X-Robots-Tag: notranslate

noimageindex

Telling not to index the images on a page:

<meta name="robots" content="noimageindex" />
X-Robots-Tag: noimageindex

unavailable_after

Telling crawlers not to index a page after January 1, 2021:

<meta name="robots" content="unavailable_after: 2021-01-01">
X-Robots-Tag: unavailable_after: 2021-01-01

Common mistakes with robots and X-Robots-Tag usage

Conflict with robots.txt

Official X-Robots-Tag and robots guidelines state that a search bot has to be able to crawl the content intended to be hidden from the index. If you disallow a certain page in the robots.txt file, the directives will be inaccessible for crawlers. 

Blocking indexing with robots.txt is another common mistake. This file serves for limiting page crawling and not for preventing pages from being indexed. To manage how your pages are displayed in the search, use the robots meta tag and x-robots.

Removing noindex

If you use the noindex directive to hide the content from the index for a certain period, it’s important to open the access for crawlers on time. For instance, you have a page with a future promo deal: if you don’t remove noindex at the time it’s ready, it won’t be shown in the search results and won’t generate traffic.

Backlinks to a nofollow page 

The nofollow instruction can fail to work if the page has external sources pointing to it.

Removing a URL from the sitemap before it gets deindexed

If a page has the noindex directive, it’s not reasonable to remove it from the sitemap file. Your sitemap allows crawlers to quickly find all pages including those that are intended to be removed from the index. 

What you can do is create a separate sitemap.xml with a list of pages containing noindex and remove URLs from the file as they get deindexed. If you upload this file into Google Search Console, robots are likely to crawl it quicker.

Not checking index statuses after making changes

It may happen that valuable content will be blocked from indexing by mistake. To avoid that, check your pages’ indexing statuses after making any changes to them.

How not to get important pages deindexed? 

You can monitor changes in your site’s code using SE Ranking’s Page Changes Monitor:

Indexing check in SE Ranking's Page Changes Monitor

What should you do when a page disappears from the search?

When a page you need to be shown in the SERPs isn’t there, check if there are directives blocking indexing or a disallow directive in the robots.txt file. Also, verify if the URL is included in the sitemap file. Using Google Search Console, you can tell search engines you need to have your page indexed, as well as inform them about an updated sitemap. 

Summary

The robots meta tag and the x-robots tag serve for managing how pages are indexed and shown in the search results. They differ in utilization: the robots meta tag is included in the page code, while the X-Robots-Tag is specified in the configuration file. Remember some of their other important characteristics:

  • The robots.txt file helps search bots crawl pages correctly, while the robots meta tag and X-Robots-Tag influence how content gets to the index. All three are vital for technical optimization.
  • Both the robots meta tag and x-robots tag are used for blocking page indexing but the latter gives robots instructions before they crawl pages, saving the crawl budget. 
  • If robots.txt prevents bots from crawling a page, the robots meta tag or x-robots directives won’t work.
  • Mistakes made while setting the robots meta tag and the x-robots tag can lead to incorrect indexing issues and website performance problems. Set the directives carefully or trust it to an experienced webmaster.
Share article
Post Views: 661
5 comments
  1. Thanks for helpful information! Great there are plugins that can do everything for me and I don’t have to edit the code :))

Leave a Reply

Your email address will not be published. Required fields are marked *

More Articles
SEO Insights
SEO metrics and KPIs: what to track and why
Jan 21, 2021 18 min read

Have you ever wondered why is it so important to evaluate SEO performance by monitoring a customer-oriented set of metrics? Firstly, you’ll be able to show the results of your work; and, secondly, you will know for yourself whether you’re going in the right direction. Therefore, we’ve come close to a stumbling point – how to measure SEO, what should we track and why?

Julia Jung
SEO Insights
Everything you need to know about image optimization
Jan 21, 2021 22 min read

Image optimization will help your web pages become more engaging to visitors and get rich snippets in the SERPs. In the article, we’re going through all the aspects of image SEO, explaining what matters most for search engines and what techniques will make your images most helpful for searchers.

Anastasia Osypenko
SEO Insights
Why every website needs a custom 404 page
Jan 11, 2021 9 min read

Visitors usually see a 404 page when a URL is no longer accessible. The error informs users that the page they have been searching for is missing. In this article you will find out why webmasters need a custom 404 page, what features it should have, and how to avoid mistakes when creating a 404 page.

Alexander Lushin