SEO Fundamentals

Google Indexing: Crawl, Budget, and Unblocking

7 min

Google indexes your pages in three steps: discovery, crawl, and indexing. A poorly managed crawl budget leaves your key pages out of the index. First check coverage in Search Console, then eliminate parasite URLs that waste your quota.

An unindexed page is an invisible page. Yet many sites suffer from silent indexing problems that their owners never detect. Here is the method to diagnose and resolve these blocks.

How Google discovers and indexes your pages

Googlebot starts from already-known pages and follows links to discover new ones. It then reads the content, renders JavaScript if necessary, and passes the page to the index — a process that can take from a few hours to several weeks.

Indexing is not guaranteed: Google independently decides which pages merit indexing based on their quality, uniqueness, and the site's authority.

Crawl budget: what it is and when it becomes critical

The crawl budget corresponds to the number of pages Googlebot agrees to crawl on your site within a given time interval. It is limited to avoid overloading your server.

For the majority of sites with fewer than 1,000 pages, crawl budget is not a problem. It becomes critical for large e-commerce sites, faceted sites, or platforms generating thousands of dynamic URLs.

Low-value pages — filter results, session URLs, duplicates — waste this budget and delay the indexing of your priority pages.

On large e-commerce sites, between 20 and 60% of crawled URLs can be low-value variants that unnecessarily consume the crawl budget.

Sector studies 2025-2026 on e-commerce SEO architectures

Diagnosing indexing problems

The 'Coverage' report (now 'Page Indexing') in Search Console is your first tool. It distinguishes indexed, excluded, and error pages, with the precise reason for each category.

Use the URL inspection tool to test a specific page: Google tells you whether it is indexed, the date of the last crawl, and any detected issues.

404 or 5xx error: the page is inaccessible at the time of crawl.
Blocked by robots.txt: Googlebot is denied access.
noindex tag present: you have explicitly requested exclusion.
Duplicate page, Google chose a different canonical URL.
Undiscoverable: no internal link points to the page.

Speeding up indexing of your new pages

Submit your new URLs via the URL inspection tool in Search Console or via the Indexing API (theoretically reserved for job listings and podcasts, but often used for other content).

The most reliable method remains building internal links from your already well-indexed pages toward your new URLs: Googlebot will discover them naturally during the next crawl.

FAQ

Why doesn't my page appear in Google despite a sitemap submission?

The sitemap tells Google the page exists, but does not force indexing. Google evaluates quality, uniqueness, and relevance before indexing. Check that no noindex is present and that the page offers substantial content.

Can a page blocked in robots.txt still appear in results?

Yes, if other sites link to it. Google can display the URL without having been able to crawl it, meaning no excerpt is shown. To fully exclude a page, combine robots.txt and noindex tag, or use noindex alone.

What is the normal delay between publication and indexing?

For a well-established site, a few hours to 48 hours for pages linked from the home page. For a recent site or an orphan page, it can take several weeks.