
Analytica House
Sep 4, 2022What is Robots.txt and How to Create and Use It?

When search engine bots visit a website, they use the robots.txt
file to control crawling and indexing. Also known as the Robots Exclusion Standard, robots.txt
tells crawlers which files, folders, or URLs on your web server they may or may not access.
You may have heard many misconceptions about how to use robots.txt
. In reality, it simply tells visiting bots which URLs on your site they should crawl. It’s used primarily to reduce request load and optimize crawl budget. It is not a way to prevent pages from appearing in search results—that requires a <meta name="robots" content="noindex">
tag or authentication barrier.
What Is Robots.txt?
robots.txt
is a plain-text file placed in your site’s root directory that gives crawlers directives about which URLs (HTTP 200) they may or may not crawl.
Bots generally obey these directives. Pages disallowed in robots.txt
won’t be crawled, though if those URLs are linked elsewhere, Google may still crawl them.
SEO Tip: If bots encounter a 5xx server error reading your robots.txt
, they’ll assume something is wrong and stop crawling. That can make images behind a CDN disappear from Google’s view, for example.
Why Is Robots.txt Important for SEO?
Before crawling your sitemap URLs, bots first fetch your robots.txt
. Any incorrect directive can lead to important pages being skipped. A temporary misconfiguration shouldn’t be irreversible—but fix it quickly to avoid lasting harm.
For instance, if you accidentally disallow a key category page, it won’t be crawled until you remove the directive. Bots cache your robots.txt
for 24 hours, so changes take up to a day to take effect.
Where to Find Robots.txt
Place your robots.txt
in your site’s root directory (e.g. example.com/robots.txt
). Crawlers universally look for it there—never move it.
Creating Robots.txt
You can hand-edit robots.txt
with any text editor or generate it via an online tool. Then upload it to your site’s root.
Manual Creation
Open a plain‐text editor and enter directives such as:
User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml
Save as robots.txt
and upload to your root directory.
Recommended Directives
Key robots.txt
commands:
- User-agent: Selects which crawler a rule applies to.
- Allow: Grants crawling permission.
- Disallow: Blocks crawling of specified paths.
- Sitemap: Points crawlers to your sitemap URL.
User-agent
Specifies which bot follows the following rules. Common bots include:
- Googlebot
- Bingbot
- YandexBot
- DuckDuckBot
- Baiduspider
- …and many more.
Example: Block only Googlebot from a thank-you page:
User-agent: Googlebot Disallow: /thank-you
Allow & Disallow
Allow:
permits crawling. Without any directives, the default is “allow all.”
Disallow:
forbids crawling of the specified path.
Examples:
- Allow all:
User-agent: *
Allow: / - Block all:
User-agent: *
Disallow: / - Block a folder but allow one subpage:
User-agent: * Disallow: /private/ Allow: /private/public-info
Testing with Google’s Robots.txt Tester
In Google Search Console, under Index > Coverage, you’ll see any robots.txt
-related errors. You can also use the Robots.txt Tester to simulate how Googlebot handles specific URLs.
Common GSC Warnings
- Blocked by robots.txt: URL is disallowed.
- Indexed though blocked by robots.txt: Page is in the index despite being disallowed—use
noindex
or remove links.
Best Practices & Reminders
- Bots fetch
robots.txt
before crawling any page. - Use
Disallow:
to prevent low-value pages from being crawled and wasting budget. - Include your sitemap with
Sitemap:
. - Keep
robots.txt
under 500 KiB—Google only reads up to that size. - Test for server errors—5xx responses cause bots to stop crawling.
- Respect case sensitivity in URL paths.
Conclusion
robots.txt
is a simple yet critical file for guiding crawlers and optimizing your crawl budget. Ensure it’s correct, keep it at your root, and test any changes promptly.
More resources

Advertising Developments in Q1 2025: A Sectoral Overview Based on Advertising Board Rulings
Advertising Developments in Q1 2025: A Sectoral Overview Based on Advertising Board Rulings Introdu...

How to Segment Your Audience in Google Ads and Meta Campaigns for Report Card Day Promotions
How to Segment Your Audience in Google Ads and Meta Campaigns for Report Card Day PromotionsReport C...

Automating RFM-Based Audience Segmentation for E-Commerce Brands Using Google Cloud
Hello