Robots.txt Generator
Check your robots.txt, sitemap.xml and other crawling issues
How to use our Robots.txt Generator?
We developed this free robots.txt generator to help webmasters, SEO experts, and marketers quickly and easily create robots.txt files.
You can generate a robots.txt file from scratch or use ready-made suggestions. In the former case, you customize the file by setting up directives (allow or disallow crawling), the path (specific pages and files), and the bots that should follow the directives. Or you can choose a ready-made robots.txt template containing a set of the most common general and CMS directives. You may also add a sitemap to the file.
As a result, our robots.txt file generator will help you get a ready-made robots.txt which you can edit and then copy or download.
Robots.txt syntax
The robots.txt syntax consists of directives, parameters, and special characters. If you want the file to work properly, you should comply with specific content requirements when creating a robots.txt file:
1. Each directive must begin on a new line. There can only be one parameter per line.
Disallow: /folder1/
Disallow: /folder2/
2. Robots.txt is case-sensitive. For example, if a website folder name is capitalized, but it’s lowercase in the robots.txt file, it can disorient crawlers.
3. You cannot use quotation marks, spaces at the beginning of lines, or semicolons after them.
Disallow: /“folder2”/
Disallow: /folder2/
How to use the Disallow directive properly?
Once you have filled the User-agent directive in, you should specify the behavior of certain (or all) bots by adding crawl instructions. Here are some essential tips:
1. Don’t leave the Disallow directive without a value. In this case, the bot will crawl all of the site’s content.
Disallow: – allow to crawl the entire website
2. Do not list every file that you want to block from crawling. Just disallow access to a folder, and all files in it will be blocked from crawling and indexing.
Disallow: /folder/
3. Don’t block access to the website with this directive:
Disallow: / – block access to the whole website
Otherwise, the site can be completely removed from the search results.
Besides that, make sure that essential website pages are not blocked from crawling: the home page, landing pages, product cards, etc. With this directive, you should only specify files and pages that should not appear on the SERPs.
Adding your Sitemap to the robots.txt file
If necessary, you can add your Sitemap to the robots.txt file. This makes it easier for bots to crawl website content. The Sitemap file is located at http://yourwebsite/sitemap.xml. You need to add a directive with the URL of your Sitemap as shown below:
1. Don’t leave the Disallow directive without a value. In this case, the bot will crawl all of the site’s content.
User-agent: *
Disallow: /folder1/
Allow: /image1/
Sitemap: https://your-site.com/sitemap.xml
How to submit a robots.txt file to search engines?
You don’t need to submit a robots.txt file to search engines. Whenever crawlers come to a site before crawling it, they start looking for a robots.txt file. And if they find one, they will read that file first before scanning your site.
At the same time, if you’ve made any changes to the robots.txt file and want to notify Google, you can submit your robots.txt file to Google Search Console. Use the Robots.txt Tester to paste the text file and click Submit.
How to define the User-agent?
When creating robots.txt and configuring crawling rules, you should specify the name of the bot to which you’re giving crawl instructions. You can do this with the help of the User-agent directive.
If you want to block or allow all crawlers from accessing some of your content, you can do this by indicating * (asterisk) as the User-agent:
User-agent: *
Or you might want all your pages to appear in a specific search engine, for example, Google. In this case, use Googlebot User-agent like this:
User-agent: Googlebot
Keep in mind that each search engine has its own bots, which may differ in name from the search engine (e.g., Yahoo’s Slurp). Moreover, some search engines have many crawlers depending on the crawl targets. For example, in addition to its main crawler Googlebot, Google has other bots:
- Googlebot News—crawls news;
- Google Mobile—crawls mobile pages;
- Googlebot Video—crawls videos;
- Googlebot Images—crawls images;
- Google AdSense—crawls websites to determine content and provide relevant ads.
How to use the Allow directive properly?
The Allow directive is used to counteract the Disallow directive. Using the Allow and Disallow directives together, you can tell search engines that they can access a specific folder, file, or page within an otherwise disallowed directory.
Disallow: /album/ – search engines are not allowed to access the /album/ directory
Allow: /album/picture1.jpg – but they are allowed to access the file picture1 of the /album/ directory
With this directive, you should also specify essential website files: scripts, styles, and images. For example:
Allow: */uploads
Allow: /wp-/*.js
Allow: /wp-/*.css
Allow: /wp-/*.png
Allow: /wp-/*.jpg
Allow: /wp-/*.jpeg
Allow: /wp-/*.gif
Allow: /wp-/*.svg
Allow: /wp-/*.webp
Allow: /wp-/*.pdf
How to add the generated robots.txt file to your website?
Search engines and other crawling bots look for a robots.txt file whenever they come to a website. But, they’ll only look for that file in one specific place—the main directory. So, after generating the robots.txt file, you should add it to the root folder of your website. You can find it at https://your-site.com/robots.txt.
The method of adding a robots.txt file depends on the server and CMS you are using. If you can’t access the root directory, contact your web hosting provider.
How important is a robots.txt file?
The robots.txt file tells search engines what pages to crawl and which bots have access to crawl the website’s content. We can solve two issues with robots.txt:
- Reduce the likelihood of certain pages being crawled, including indexing and appearing in the search results.
- Save crawling budget.
Under what conditions will the generated robots.txt file work properly?
Robots.txt file will properly work under three conditions:
- Properly specified User-agent and directives. For example, each group begins with a User-agent line, one directive per line.
- The file must be in the .txt format only.
- The robots.txt file must be located in the root of the website host to which it applies.