Analytica House
Sep 4, 2022What is Robots.txt and How to Create and Use It?
When search engine bots visit a website, they use the robots.txt file to control crawling and indexing. Also known as the Robots Exclusion Standard, robots.txt tells crawlers which files, folders, or URLs on your web server they may or may not access.
You may have heard many misconceptions about how to use robots.txt. In reality, it simply tells visiting bots which URLs on your site they should crawl. It’s used primarily to reduce request load and optimize crawl budget. It is not a way to prevent pages from appearing in search results—that requires a tag or authentication barrier.
What Is Robots.txt?
robots.txt is a plain-text file placed in your site’s root directory that gives crawlers directives about which URLs (HTTP 200) they may or may not crawl.

Bots generally obey these directives. Pages disallowed in robots.txt won’t be crawled, though if those URLs are linked elsewhere, Google may still crawl them.
SEO Tip: If bots encounter a 5xx server error reading your robots.txt, they’ll assume something is wrong and stop crawling. That can make images behind a CDN disappear from Google’s view, for example.
Why Is Robots.txt Important for SEO?
Before crawling your sitemap URLs, bots first fetch your robots.txt. Any incorrect directive can lead to important pages being skipped. A temporary misconfiguration shouldn’t be irreversible—but fix it quickly to avoid lasting harm.

For instance, if you accidentally disallow a key category page, it won’t be crawled until you remove the directive. Bots cache your robots.txt for 24 hours, so changes take up to a day to take effect.
Where to Find Robots.txt
Place your robots.txt in your site’s root directory (e.g. example.com/robots.txt). Crawlers universally look for it there—never move it.
Creating Robots.txt
You can hand-edit robots.txt with any text editor or generate it via an online tool. Then upload it to your site’s root.
Manual Creation
Open a plain‐text editor and enter directives such as:
User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml
Save as robots.txt and upload to your root directory.
Recommended Directives
Key robots.txt commands:
- User-agent: Selects which crawler a rule applies to.
- Allow: Grants crawling permission.
- Disallow: Blocks crawling of specified paths.
- Sitemap: Points crawlers to your sitemap URL.
User-agent
Specifies which bot follows the following rules. Common bots include:
- Googlebot
- Bingbot
- YandexBot
- DuckDuckBot
- Baiduspider
- …and many more.
Example: Block only Googlebot from a thank-you page:
User-agent: Googlebot Disallow: /thank-you
Allow & Disallow
Allow: permits crawling. Without any directives, the default is “allow all.”
Disallow: forbids crawling of the specified path.
Examples:
- Allow all:
User-agent: *
Allow: / - Block all:
User-agent: *
Disallow: / - Block a folder but allow one subpage:
User-agent: * Disallow: /private/ Allow: /private/public-info
Testing with Google’s Robots.txt Tester
In Google Search Console, under Index > Coverage, you’ll see any robots.txt-related errors. You can also use the Robots.txt Tester to simulate how Googlebot handles specific URLs.

Common GSC Warnings
- Blocked by robots.txt: URL is disallowed.
- Indexed though blocked by robots.txt: Page is in the index despite being disallowed—use
noindexor remove links.
Best Practices & Reminders
- Bots fetch
robots.txtbefore crawling any page. - Use
Disallow:to prevent low-value pages from being crawled and wasting budget. - Include your sitemap with
Sitemap:. - Keep
robots.txtunder 500 KiB—Google only reads up to that size. - Test for server errors—5xx responses cause bots to stop crawling.
- Respect case sensitivity in URL paths.
Conclusion
robots.txt is a simple yet critical file for guiding crawlers and optimizing your crawl budget. Ensure it’s correct, keep it at your root, and test any changes promptly.
More resources
5 Ways to Accurately Measure Sales Impact with Google MMM
Google MMM (Marketing Mix Modeling) is one of the most powerful statistical methods for understandin...
ChatGPT Shopping Research: An AI-Powered Shopping Assistant
ChatGPT Shopping Research is an AI-powered shopping assistant that accelerates users' shopping resea...
Data-Driven Tactics to Build Customer Loyalty After Black Friday
Customer loyalty is the most valuable outcome of the Black Friday period, as short-term traffic and...