Navigation menu

← All articles
  • SEO
  • Technical SEO
  • Google

How Google crawls, indexes, and ranks

The crawl, index, and serve pipeline explained in plain words: why each stage fails, and how knowing the difference helps you debug your own site far faster.


Three stages, not one

"Why isn't my page on Google?" is one question, but it has at least three different answers, and they point to three different fixes. That is the whole reason it helps to understand what Google is actually doing under the hood.

Google's own getting-started guide to how Search works breaks it into three stages: crawling, indexing, and serving results. They happen in order, and a page can fall out at any one of them. When something is wrong, your first job is to work out which stage is failing. Otherwise you spend a week rewriting copy when the real problem was a blocked file.

Stage 1: Crawling

Crawling is discovery. A bot called Googlebot follows links from pages it already knows about, fetches the HTML, and finds more links to follow. That is mostly how the web gets mapped: one link to the next.

A few things worth knowing about the crawler:

  • There are two of them, but think of one. Googlebot Smartphone is the primary crawler, because Google indexes mostly from the mobile version of your pages. There is a desktop crawler too, but the mobile one is the one that matters.
  • It reads the first 2MB of an HTML page. If your important content is buried below two megabytes of markup, it may never be seen. Most pages are nowhere near that, but bloated templates can get close.
  • User agents get spoofed constantly. If you see "Googlebot" in your logs, that is not proof. Real Googlebot is verifiable by reverse DNS or against Google's published IP ranges. Plenty of scrapers lie.

The two ways crawling fails are simple. Either Googlebot cannot reach the page (no links point to it, or robots.txt blocks it), or it reaches the page but cannot render it properly. That second one is the sneaky one, covered below. If a page is genuinely unreachable, no amount of great content will save it. This is what crawlability means in practice: can the bot actually get to the thing.

Two clarifications that save a lot of confusion:

  • Sitemaps are optional. They help Google find pages, especially on a large or poorly-linked site, but Google does not require one. A small site with clean internal links is often fine without it. A sitemap is a hint, not a gate.
  • robots.txt controls crawling, not indexing. Blocking a URL in robots.txt stops the fetch. It does not reliably keep the page out of the index, because Google can still index a blocked URL it learned about from links elsewhere. If you want a page kept out, use a noindex meta tag (which Google has to crawl the page to see) or a password.

Stage 2: Indexing

Once a page is fetched, Google tries to understand it and store it. That is indexing. It works out what the page is about, processes the text, the title, the key images, and decides whether to file it in the index at all. Not every crawled page gets indexed.

This is the stage where rendering bites people. Modern sites build a lot of their content with JavaScript, and Google does render JavaScript, but it can only render what it can load. If your CSS and JS are blocked in robots.txt, or sit behind something Googlebot cannot fetch, Google sees a different, emptier page than your visitors do. Google is explicit about this in the SEO starter guide: make sure Google can access the same CSS and JavaScript files that a user's browser loads. A blocked stylesheet can quietly wreck how your page is understood.

A few myths worth retiring here, because they cause real wasted effort:

  • There is no magic word count. A short page can be indexed and rank fine; a long one is not automatically better.
  • Heading order does not affect ranking. Going from an h2 to an h4 is an accessibility concern, not a ranking one. Use a sensible order for your readers, but do not lose sleep over the hierarchy for Google's sake.
  • Google does not use the keywords meta tag. It has not for years. Stuffing it does nothing.
  • There is no duplicate-content penalty. Duplicates just create inefficiency: Google picks one version to show. You guide that choice with a rel=canonical tag rather than fearing a punishment that does not exist.

Stage 3: Serving (and where ranking lives)

When someone searches, Google sorts through the index and serves what it judges to be the most relevant, useful results for that query. Ranking happens here. This is also where the meta description earns its keep: it does not affect ranking, but a clear, unique one influences the snippet a user sees and whether they click. Titles should be unique, clear, and concise for the same reason. The snippet itself is usually pulled from your page content, so the content has to actually say something.

It is worth saying plainly: this stage is also the entry point for Google's AI features. AI Overviews and AI Mode draw from the same index using retrieval over your indexed content. There is no separate door. If a page is crawled, indexed, and helpful, it is eligible. If it failed at stage one, no clever AI formatting changes that. We unpack that split in more detail in our piece on what AEO actually changes versus SEO.

Why the three-stage model is a debugging tool

The payoff of all this is diagnostic. When a page is missing, you check the stages in order:

  1. Is it indexed at all? Search site:yourdomain.com/the-page-url in Google. If it shows up, the page is in the index, and your problem is ranking (stage 3), not crawling or indexing. If it does not show up, the problem is earlier.
  2. Can it be crawled? Check robots.txt, check that real links point to the page, confirm there is no stray noindex.
  3. Does it render? Load the page with CSS and JS disabled, or use Google's URL inspection tools, and see whether your actual content is still there.

That ordering stops you from guessing. A ranking problem and a crawling problem look identical from the outside ("not on Google"), but they need opposite fixes. The site: operator is the fastest way to tell them apart, and it costs you ten seconds.

If you want the wider context for non-specialists, our SEO guide for founders puts this pipeline alongside the handful of other things that actually move the needle, without the jargon.

// next step

See what AI actually reads on your site.

Free first audit. No credit card. Your Legibility Score in under two minutes.

Run a free audit →