logo

What is robots.txt and how to create it?

Mikołaj Sykuła
Mikołaj Sykuła
github icongithub icon
May 15, 2023
What is robots.txt and how to create it?

robots.txt is one of the most important files that can be on your website. Its main task is to communicate with search engine robots such as Googlebot and instruct them which pages or parts of the page can or cannot be crawled and indexed. In this article, we will discuss how to write a robots.txt file, the benefits of a properly configured robots.txt file, its capabilities, and provide useful tools and links to documentation.

What is robots.txt?

robots.txt is a file located in the site's root directory that tells search engine robots which parts of the site they can scan and which they should avoid. This is the first file that crawlers check when they visit your site.

How does robots.txt work?

When a search engine robot visits a site, the first thing it does is check the robots.txt file. If the file exists, the crawler reads its instructions to find out which pages can be indexed and which should be skipped. If the file doesn't exist, the crawler will assume it can crawl the entire page.

Where should robots.txt be?

The robots.txt file should be at the url: /robots.txt, e.g. https://syki.dev/robots.txt

How to write a robots.txt file?

Basic properties

The robots.txt file consists of directives that tell robots what to do. Here are the most important directives:

Advanced properties

These properties are rarely used because they are often not considered by bots, but can be useful in some cases:

Formatting rules

The robots.txt file must follow a specific format:

Here is an example of a robots.txt file:

txt
Loading...

Robots.txt usage examples

The robots.txt file can be configured in many different ways, depending on the needs of your website. Here are some examples:

Benefits of using robots.txt correctly

A properly configured robots.txt file can bring many benefits:

Robots.txt development and testing tools

Creating and testing the robots.txt file can be facilitated by various tools:

In addition, the full specification of robots.txt is available at The Web Robots Pages.

Summary

The robots.txt file is an essential element of any website that allows you to control the access of search engine robots to various parts of the website. A properly configured robots.txt file helps to focus the attention of robots on the most important pages, saves server resources and improves SEO.

Note that while the robots.txt file is a powerful tool, not all robots respect it - some malicious bots may deliberately ignore it to scan and index pages that should be kept private. Therefore, robots.txt should not be the only means of protecting private pages - always use appropriate security and access permissions to protect your data.

In this article, I covered the basics of creating and using a robots.txt file, but it is a much more complex topic with many nuances and possibilities. It's always a good idea to consult the official documentation and other trusted sources to learn more.

Related Blogs