Robots.txt Explained: Syntax, Real Examples, and Mistakes That Tank Your SEO

March 2, 2026·6 min read

Your robots.txt file is one of the first things Googlebot reads when it visits your site. A good one keeps search engines focused on your important pages. A bad one can hide your entire site from Google — and you might not notice for weeks.

Here's how robots.txt actually works, what the syntax means, and how to write one without accidentally nuking your organic traffic.

What robots.txt Does

Robots.txt is a plain text file at the root of your website (yoursite.com/robots.txt) that tells search engine crawlers which URLs they're allowed to request. It follows the Robots Exclusion Protocol, a standard that's been around since 1994.

Key thing to understand: robots.txt controls crawling, not indexing. If Google finds a link to a page you've blocked in robots.txt, it can still index the URL — it just won't know what's on the page. You'll see entries in search results with "No information is available for this page" instead of a description.

If you want to prevent indexing entirely, use a noindex meta tag instead. Robots.txt is for managing crawl behavior.

The Syntax

Robots.txt uses four directives:

User-agent

Targets a specific crawler. * means all crawlers.

User-agent: *
User-agent: Googlebot
User-agent: Bingbot

Disallow

Blocks a URL path from being crawled.

Disallow: /admin/
Disallow: /cart
Disallow: /search

Allow

Overrides a Disallow for a specific path. Mostly used by Googlebot (not all crawlers support it).

Disallow: /images/
Allow: /images/public/

Sitemap

Points crawlers to your XML sitemap. Place this outside any User-agent block.

Sitemap: https://yoursite.com/sitemap.xml

Pattern Matching

Googlebot supports two wildcards:

* matches any sequence of characters — Disallow: /*.pdf blocks all PDF files
$ marks the end of a URL — Disallow: /*.php$ blocks URLs ending in .php but not /page.php?id=5

Real Examples by Platform

WordPress

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /wp-json/
Disallow: /search/
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /?s=
Disallow: /tag/
Disallow: /author/

Sitemap: https://yoursite.com/sitemap.xml

The Allow: /wp-admin/admin-ajax.php line is important — many WordPress themes and plugins need that file for front-end functionality, and blocking it can break how Google renders your pages.

Shopify

Shopify generates a robots.txt automatically, but here's what a solid one looks like:

User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkouts/
Disallow: /checkout
Disallow: /account
Disallow: /collections/*sort_by*
Disallow: /collections/*+*
Disallow: /search
Disallow: /*.json

Sitemap: https://yourstore.com/sitemap.xml

The *.json block prevents crawlers from accessing raw JSON endpoints — those are API responses, not content you want indexed.

Next.js / Static Sites

User-agent: *
Disallow: /api/
Disallow: /_next/
Disallow: /admin/

Sitemap: https://yoursite.com/sitemap.xml

Simple. Block your API routes and build artifacts, leave everything else open.

When to Use Disallow vs Noindex

This trips people up constantly.

| Scenario | Use | |---|---| | Page has no value for search (admin, cart, internal tools) | Disallow in robots.txt | | Page exists but shouldn't appear in results (thank-you pages, thin content) | noindex meta tag | | Staging or dev environment | Disallow: / AND noindex on every page | | Duplicate content you can't canonical | noindex meta tag |

The rule of thumb: if you don't want it crawled, use robots.txt. If you don't want it indexed, use noindex. If you're really serious, use both.

Crawl Budget: Why This Matters

Google allocates a "crawl budget" to your site — the number of pages it'll crawl in a given period. For small sites (under 10,000 pages), this rarely matters. For larger sites, wasting crawl budget on admin pages, search result pages, or faceted navigation means Google spends less time on the pages you actually want ranked.

Blocking low-value paths in robots.txt keeps Google focused. E-commerce sites with hundreds of filtered category pages (/shoes?color=red&size=10) benefit the most from this.

5 Mistakes That Actually Hurt

1. Blocking CSS and JavaScript

# Don't do this
Disallow: /css/
Disallow: /js/

Google needs your CSS and JS to render pages properly. Blocking these files means Google sees a broken, unstyled version of your site. This was common advice in the early 2000s. It's terrible advice now.

2. Blocking Your Entire Site

# This kills your SEO
User-agent: *
Disallow: /

One forward slash after Disallow blocks everything. This is the most common robots.txt disaster — usually happens when a dev copies the staging robots.txt to production. Sites have lost months of rankings from this single line.

3. Using Robots.txt to Hide Sensitive Pages

Robots.txt is public. Anyone can visit yoursite.com/robots.txt and see every path you've blocked. If you're blocking /secret-admin-panel/ or /internal-docs/, you're advertising those URLs to everyone. Use authentication and proper access controls instead.

4. Forgetting the Trailing Slash

Disallow: /admin    # blocks /admin only (and /admin-panel, /administration)
Disallow: /admin/   # blocks /admin/ and everything under it

That trailing slash matters. Without it, you might block more than intended — or less.

5. Blocking Pages That Have Backlinks

If other sites link to a page you've blocked in robots.txt, Google can't crawl it to pass link equity through to the rest of your site. Those backlinks effectively go to waste. If a page has inbound links, let Google crawl it — even if you noindex it.

How to Test Your Robots.txt

Before pushing changes live, validate your file:

Google Search Console — the Robots.txt Tester under Settings > Crawling shows exactly how Googlebot interprets your rules. Enter any URL and it tells you whether it's blocked.
Manual check — visit yoursite.com/robots.txt in your browser. If you get a 404, you don't have one (which is fine — it means everything is crawlable).
Syntax check — make sure every Disallow has a corresponding User-agent above it. Orphaned directives get ignored.

Generate Yours in 30 Seconds

Writing robots.txt by hand is straightforward once you know the syntax, but it's easy to miss an edge case or typo a path. Our Robots.txt Generator lets you build one visually — toggle which paths to block, add your sitemap URL, and copy the output. No syntax to memorize.

You can also pair it with the Meta Tag Generator for your noindex tags, or use the Schema Markup Generator to add structured data that makes your crawled pages stand out in results.

Ready to try it?

Build a robots.txt file for your website. Control which pages search engines can crawl with an easy visual editor.

🤖 Robots.txt Generator — Free Online Tool

Get notified about new SEO tools

More free tools coming soon — keyword research, sitemap generator, and more.