Noindex vs Disallow: When to Use Each (And the Mistake Everyone Makes)

May 18, 2026·6 min read

If you've ever tried to remove a page from Google and watched it stubbornly stay in the index, you've probably hit the noindex/disallow trap. They sound interchangeable. They are not. And combining them — which is exactly what most tutorials tell you to do — is the fastest way to keep a page indexed forever with no way to fix it.

Here's the short version: disallow controls crawling. Noindex controls indexing. They are different problems with different fixes.

The Actual Difference

A robots.txt Disallow rule tells Googlebot "don't crawl this URL." Google never fetches the page. It never sees the HTML, the title, the meta tags — none of it.

A noindex directive tells Google "you can crawl this page, but don't show it in search results." It has to be delivered in one of two places:

A meta robots tag in the HTML: <meta name="robots" content="noindex">
An HTTP response header: X-Robots-Tag: noindex

That distinction is the whole game. Disallow blocks crawling. Noindex blocks indexing. Google needs to crawl a page to see the noindex tag.

The Mistake That Keeps Pages Indexed Forever

Here's what people do when they want a page gone:

# robots.txt
User-agent: *
Disallow: /private-page

# Plus the HTML
<meta name="robots" content="noindex">

This is wrong. Google can't read the noindex tag because robots.txt is blocking the crawl. The page sits in the index with no description (because Google never saw the content) and there's no clean way to remove it.

You'll see this in Search Console as "Indexed, though blocked by robots.txt." That warning isn't a bug. It's Google telling you exactly what you did.

The fix: pick one. If you want a page deindexed, allow Google to crawl it, then add the noindex tag. Once Google recrawls and sees the noindex, the page drops from the index. After it's gone, you can add the disallow rule if you also want to stop wasting crawl budget on it.

When to Use Disallow

Use a robots.txt disallow when you want to prevent crawling entirely. The page can still get indexed (without a snippet) if other sites link to it, but Googlebot won't waste resources fetching it.

Good use cases:

Internal search result pages — Disallow: /?s= keeps thin, infinite-URL search pages from eating crawl budget.
Faceted navigation — Disallow: /shop/*?color=* prevents Google from crawling thousands of filter combinations.
Admin panels — Disallow: /wp-admin/ (already in WordPress's default).
API endpoints and JSON feeds — Anything that returns data, not pages users would find via search.
Staging environments — Disallow: / on a staging subdomain prevents crawl, though noindex is safer here.

The pattern: disallow is for stuff Google shouldn't waste time on, not stuff you're trying to hide. If it's already in the index, disallow alone won't remove it.

You can build a clean robots.txt file with the Robots.txt Generator and check exactly what you're blocking before you ship it.

When to Use Noindex

Use noindex when you want a page out of search results. The page exists, users can reach it directly, but it won't show up when someone searches Google.

Good use cases:

Thank-you pages — /thank-you after a form submission. Useless in search results.
Internal user dashboards — Logged-in pages that shouldn't be indexed.
Duplicate content you can't canonicalize — Print versions, sort variants, PDFs of HTML pages.
Tag and category pages with no unique content — Common on blogs.
Old promotions and expired landing pages — Keep them live for direct visitors, hide from search.
Author archives with only one post — WordPress default that adds zero value.

To add the meta tag, generate it through the Meta Tag Generator and paste it into your <head>. For non-HTML files (PDFs, images), use the X-Robots-Tag HTTP header instead — that's covered in the HTTP Headers guide.

What "Noindex, Follow" Actually Means

The full meta tag has two values: index/noindex and follow/nofollow.

<meta name="robots" content="noindex, follow">

noindex — Don't show this page in search results.
follow — Still pass link equity through outbound links on this page.

noindex, follow is the right default for most pages you're trying to deindex. The links on the page still help other pages on your site rank. noindex, nofollow cuts the page off from your link graph entirely, which usually isn't what you want.

Google has said it eventually treats long-term noindex pages like noindex, nofollow regardless of what you specify. That's fine — by the time that kicks in, the page has stopped being part of your active site anyway.

The Removal Workflow That Actually Works

If you have pages stuck in the index that you want gone, here's the sequence:

Remove any robots.txt block that's preventing crawling. Yes, that feels backwards. Trust it.
Add <meta name="robots" content="noindex"> to the page's HTML head, or set X-Robots-Tag: noindex in the HTTP response.
Wait for Google to recrawl. This can take days or weeks. Use Search Console's URL Inspection tool to request indexing, which forces a recrawl.
Verify the page drops out of search results. Search site:example.com/your-url to check.
Once it's gone, you can optionally add the disallow rule to save crawl budget — but only after the noindex has done its job.

For urgent removals (legal issues, accidentally exposed data), use the Removals tool in Search Console to hide pages from search results within hours. That buys you time to apply the proper noindex.

Other Mistakes Worth Avoiding

Disallowing your CSS and JS. Google needs these to render your pages and judge them for mobile-friendliness and Core Web Vitals. Blocking /wp-includes/ or /assets/ will hurt rankings. Don't do it.

Trusting that staging environments are safe because they're "private." If anyone links to your staging URL — even from an internal Slack — Google can find it. Always set X-Robots-Tag: noindex on staging servers, or password-protect them entirely.

Using nofollow on internal links to deindex pages. This doesn't deindex anything. It just makes Google ignore the link for ranking purposes. Use noindex on the destination page instead.

Forgetting that noindex on a canonicalized page is conflicting. If a page has a canonical pointing elsewhere AND a noindex tag, Google gets mixed signals. Pick one strategy per page.

The Mental Model to Hold Onto

Robots.txt is a sign at the front of your store: "Don't come in." Useful for keeping crawlers out of areas that would slow them down or confuse them. But if you put a sign on a building that's also listed in the phone book, the listing stays.

Noindex is the phone book listing itself. Remove it, and the building stops showing up in searches — even though anyone who knows the address can still walk in.

For most "I want this page out of Google" situations, the answer is noindex, not disallow. Reach for robots.txt only when you're trying to save Googlebot's time, not hide existing content.

Ready to try it?

Build a robots.txt file for your website. Control which pages search engines can crawl with an easy visual editor.

🤖 Robots.txt Generator — Free Online Tool

Get notified about new SEO tools

More free tools coming soon — keyword research, sitemap generator, and more.