Navigation menu

← All terms

// definition

robots.txt

A root file telling crawlers what they may fetch. It controls crawling, not indexing — that needs a noindex tag. Here is the difference and why it matters.


What it is

robots.txt is a plain-text file at the root of your site (yourdomain.com/robots.txt) that tells crawlers which paths they are allowed to fetch. You write rules per user agent — User-agent: Googlebot, then Disallow: or Allow: lines for specific paths.

Crucially, it controls crawling, not indexing. A blocked page can still appear in search results if other sites link to it, because Google never fetched the page to see your wishes. To keep a page out of the index, you need a noindex meta tag — which Google can only read if it is allowed to crawl the page. So do not Disallow a page you also want to noindex.

Why it matters

Get this file wrong and you can quietly hide your whole site, or fail to hide the bits you meant to. A stray Disallow: / is the classic launch-day disaster.

It also matters that Google can reach your CSS and JavaScript — block those by accident and pages render wrong for Googlebot, which is mostly mobile-first. Note too that user agents are easily spoofed, so robots.txt is a polite request, not a security wall; password-protect anything truly private.

Check your file with our robots.txt checker, and see where it fits in our technical SEO checklist for a new site.

// next step

See how legible your site is to AI.

Free first audit. No credit card. Your Legibility Score in under two minutes.