Robots.txt Tester
How to read a robots.txt file?
How to use our online Robots.txt Tester?
Why is a robots.txt file necessary?
Robots.txt files provide search engines with important information about crawling files and web pages. This file is used primarily to manage crawler traffic to your website in order to avoid overloading your site with requests.
You can solve two problems with its help:
- First, reduce the likelihood of certain pages being crawled, including getting indexed and appearing in search results.
- Second, save the crawling budget by closing pages that shouldn’t be indexed.
However, if you want to prevent a page or another digital asset from appearing in Google Search, a more reliable option would be to add the no-index attribute to the robots meta tag.
How to make sure robots.txt is working fine?
A quick and easy way to make sure your robots.txt file is working properly is to use special tools.
For example, you can validate your robots.txt by using our tool: enter up to 100 URLs and it will show you whether the file blocks crawlers from accessing specific URLs on your site.
To quickly detect errors in the robots.txt file, you can also use Google Search Console.
Common robots.txt issues
- The file is not in the .txt format. In this case, bots will not be able to find and crawl your robots.txt file because of the format mismatch.
- Robots.txt is not located in the root directory. The file must be placed in the top-most directory of the website. If it is placed in a subfolder, your robots.txt file is probably not going to be visible to search bots. To fix this issue, move your robots.txt file to your root directory.
In the Disallow directive, you must specify particular files or pages that should not appear on SERPs. It can be used with the User-agent directive in order to block the website from a particular crawler.
- Disallow without value. An empty Disallow: directive tells bots that they can visit any website pages.
- Disallow without value. An empty Disallow: directive tells bots that they can visit any website pages.
- Blank lines in the robots.txt file. Do not leave blank lines between directives. Otherwise, bots will not be able to crawl the file correctly. An empty line in the robots.txt file should be placed only before indicating a new User-agent.
Robots.txt best practices
- Use the proper case in robots.txt. Bots treat folder and section names as case-sensitive. So, if a folder name starts with a capital letter, naming it with a lowercase letter will disorient the crawler, and vice versa.
- Each directive must begin on a new line. There can only be one parameter per line.
- The use of space at the beginning of a line, quotation marks, or semicolons for directives is strictly prohibited.
- There is no need to list every file you want to block from crawlers. You just need to specify a folder or directory in the Disallow directive, and all of the files from these folders or directories will also be blocked from crawling.
- You can use regular expressions to create robots.txt with more flexible instructions.
- The asterisk (*) indicates any value variation.
- The dollar sign ($) is an asterisk-type restriction that applies to website URL addresses. It’s used to specify the end of the URL path.
- Use server-side authentication to block access to private content. That way, you can ensure that important data is not stolen.
- Use one robots.txt file per domain. If you need to set crawl guidelines for different sites, create a separate robots.txt for each one.
Other ways to test your robots.txt file
You can analyze your robots.txt file using the Google Search Console tool.
This robots.txt tester shows you whether your robots.txt file is blocking Google crawlers from accessing specific URLs on your website. The tool is unavailable in the new version of GSC, but you can access it by clicking this link.
Choose your domain and the tool will show you the robots.txt file, its errors, and warnings.
Go to the bottom of the page, where you can type the URL of a page in the text box. As a result, the robots.txt tester will verify that your URL has been blocked properly.
What should be in a robots.txt file?
Robots.txt files contain information that instructs crawlers on how to interact with a particular site. It starts with a User-agent directive that specifies the search bot to which the rules apply. Then you should specify directives that allow and block certain files and pages from crawlers. At the end of a robots.txt file, you can optionally add a link to your sitemap.
How to open a robots.txt file?
In order to access the content of any website’s robots.txt file, you have to type https://yourwebsite/robots.txt into the browser.
Can bots ignore robots.txt?
Crawlers always refer to an existing robots.txt file when visiting a website. Although the robots.txt file provides rules for bots, it can’t enforce the instructions. The robots.txt file itself is a list of guidelines for crawlers—not strict rules. Therefore, in some cases, bots may ignore these directives.
How to test if robots.txt is working properly?
You can check the robots.txt file with our tool. Just enter the necessary URLs. Here you’ll see if a given website URL is allowed or blocked from crawling.
How do I fix robots.txt?
A robots.txt file is a text document. You can change the current file via a text editor and then add it again to the website root directory. What’s more, many CMS, including WordPress, have various plugins that allow making changes to the robots.txt file—you can do it directly from the admin dashboard.
Can robots.txt be redirected?
The file can only be accessed at http://yourwebsite/robots.txt and cannot be redirected to other website pages. At the same time, you can set up a redirect to the robots.txt file of another domain.
Does Google respect robots.txt?
When visiting a website, Google’s crawlers first refer to the robots.txt file containing all crawling guidelines. But in some cases, the search engine may ignore these directives.