- SEO
- Technical SEO
How Google crawls, indexes, and ranks
The crawl, index, and serve pipeline explained in plain words: why each stage fails, and how knowing the difference helps you debug your own site far faster.
Three stages, not one
"Why isn't my page on Google?" is one question, but it has at least three different answers, and they point to three different fixes. That is the whole reason it helps to understand what Google is actually doing under the hood.
Google's own getting-started guide to how Search works breaks it into three stages: crawling, indexing, and serving results. They happen in order, and a page can fall out at any one of them. When something is wrong, your first job is to work out which stage is failing. Otherwise you spend a week rewriting copy when the real problem was a blocked file.
Stage 1: Crawling
Crawling is discovery. A bot called Googlebot follows links from pages it already knows about, fetches the HTML, and finds more links to follow. That is mostly how the web gets mapped: one link to the next.
A few things worth knowing about the crawler:
- There are two of them, but think of one. Googlebot Smartphone is the primary crawler, because Google indexes mostly from the mobile version of your pages. There is a desktop crawler too, but the mobile one is the one that matters.
- It reads the first 2MB of an HTML page. If your important content is buried below two megabytes of markup, it may never be seen. Most pages are nowhere near that, but bloated templates can get close.
- User agents get spoofed constantly. If you see "Googlebot" in your logs, that is not proof. Real Googlebot is verifiable by reverse DNS or against Google's published IP ranges. Plenty of scrapers lie.
The two ways crawling fails are simple. Either Googlebot cannot reach the page (no links point to it, or robots.txt blocks it), or it reaches the page but cannot render it properly. That second one is the sneaky one, covered below. If a page is genuinely unreachable, no amount of great content will save it. This is what crawlability means in practice: can the bot actually get to the thing.
Two clarifications that save a lot of confusion:
- Sitemaps are optional. They help Google find pages, especially on a large or poorly-linked site, but Google does not require one. A small site with clean internal links is often fine without it. A sitemap is a hint, not a gate.
robots.txtcontrols crawling, not indexing. Blocking a URL inrobots.txtstops the fetch. It does not reliably keep the page out of the index, because Google can still index a blocked URL it learned about from links elsewhere. If you want a page kept out, use anoindexmeta tag (which Google has to crawl the page to see) or a password.
Stage 2: Indexing
Once a page is fetched, Google tries to understand it and store it. That is indexing. It works out what the page is about, processes the text, the title, the key images, and decides whether to file it in the index at all. Not every crawled page gets indexed.
This is the stage where rendering bites people. Modern sites build a lot of their content with JavaScript, and Google does render JavaScript, but it can only render what it can load. If your CSS and JS are blocked in robots.txt, or sit behind something Googlebot cannot fetch, Google sees a different, emptier page than your visitors do. Google is explicit about this in the SEO starter guide: make sure Google can access the same CSS and JavaScript files that a user's browser loads. A blocked stylesheet can quietly wreck how your page is understood.
A few myths worth retiring here, because they cause real wasted effort:
- There is no magic word count. A short page can be indexed and rank fine; a long one is not automatically better.
- Heading order does not affect ranking. Going from an
h2to anh4is an accessibility concern, not a ranking one. Use a sensible order for your readers, but do not lose sleep over the hierarchy for Google's sake. - Google does not use the keywords meta tag. It has not for years. Stuffing it does nothing.
- There is no duplicate-content penalty. Duplicates just create inefficiency: Google picks one version to show. You guide that choice with a
rel=canonicaltag rather than fearing a punishment that does not exist.
Stage 3: Serving (and where ranking lives)
When someone searches, Google sorts through the index and serves what it judges to be the most relevant, useful results for that query. Ranking happens here. This is also where the meta description earns its keep: it does not affect ranking, but a clear, unique one influences the snippet a user sees and whether they click. Titles should be unique, clear, and concise for the same reason. The snippet itself is usually pulled from your page content, so the content has to actually say something.
It is worth saying plainly: this stage is also the entry point for Google's AI features. AI Overviews and AI Mode draw from the same index using retrieval over your indexed content. There is no separate door. If a page is crawled, indexed, and helpful, it is eligible. If it failed at stage one, no clever AI formatting changes that. We unpack that split in more detail in our piece on what AEO actually changes versus SEO.
Why the three-stage model is a debugging tool
The payoff of all this is diagnostic. When a page is missing, you check the stages in order:
- Is it indexed at all? Search
site:yourdomain.com/the-page-urlin Google. If it shows up, the page is in the index, and your problem is ranking (stage 3), not crawling or indexing. If it does not show up, the problem is earlier. - Can it be crawled? Check
robots.txt, check that real links point to the page, confirm there is no straynoindex. - Does it render? Load the page with CSS and JS disabled, or use Google's URL inspection tools, and see whether your actual content is still there.
That ordering stops you from guessing. A ranking problem and a crawling problem look identical from the outside ("not on Google"), but they need opposite fixes. The site: operator is the fastest way to tell them apart, and it costs you ten seconds.
If you want the wider context for non-specialists, our SEO guide for founders puts this pipeline alongside the handful of other things that actually move the needle, without the jargon.
// related articles
- AI crawlers, explained: ChatGPT to GrokA plain-language reference on how OpenAI, Anthropic, Perplexity, Google, Copilot, and Grok crawl your site, and how to let the right bots in to cite you.
- Core Web Vitals for non-engineersLCP, INP and CLS explained without the jargon: why a slow or jumpy page quietly costs you visitors, and the specific things you can ask a developer to fix.
- Helpful content and E-E-A-T, decodedWhat Google actually means by helpful content and E-E-A-T, why neither is a ranking dial you can turn, and the who/how/why questions to ask of every page.
// next step
See what AI actually reads on your site.
Free first audit. No credit card. Your Legibility Score in under two minutes.
Run a free audit →