// definition
robots.txt
A root file telling crawlers what they may fetch. It controls crawling, not indexing — that needs a noindex tag. Here is the difference and why it matters.
What it is
robots.txt is a plain-text file at the root of your site (yourdomain.com/robots.txt) that tells crawlers which paths they are allowed to fetch. You write rules per user agent — User-agent: Googlebot, then Disallow: or Allow: lines for specific paths.
Crucially, it controls crawling, not indexing. A blocked page can still appear in search results if other sites link to it, because Google never fetched the page to see your wishes. To keep a page out of the index, you need a noindex meta tag — which Google can only read if it is allowed to crawl the page. So do not Disallow a page you also want to noindex.
Why it matters
Get this file wrong and you can quietly hide your whole site, or fail to hide the bits you meant to. A stray Disallow: / is the classic launch-day disaster.
It also matters that Google can reach your CSS and JavaScript — block those by accident and pages render wrong for Googlebot, which is mostly mobile-first. Note too that user agents are easily spoofed, so robots.txt is a polite request, not a security wall; password-protect anything truly private.
Check your file with our robots.txt checker, and see where it fits in our technical SEO checklist for a new site.
// related terms
// next step
See how legible your site is to AI.
Free first audit. No credit card. Your Legibility Score in under two minutes.