Googlebot’s 2MB Limit: Fix Crawl & Indexing Issues

AI Business Tools Singapore••By 3L3C

Googlebot only fetches 2MB of HTML per page. Learn what gets truncated, how rendering works, and fixes SMEs can apply to protect SEO and leads.

Technical SEOGooglebotIndexingJavaScript SEOSME MarketingWebsite PerformanceAI Business Tools Singapore
Share:

Googlebot’s 2MB Limit: Fix Crawl & Indexing Issues

A surprising number of SEO “mysteries” come down to one boring constraint: bytes. Not rankings. Not backlinks. Not “Google hates my site.” Just the simple reality that Googlebot only fetches up to 2 MB per URL (for HTML pages)—and anything after that point never gets fetched, rendered, or indexed.

Google’s Gary Illyes published fresh details on how crawling actually works: Googlebot isn’t a lone robot roaming the web. It’s one client on a shared Google crawling platform used by other products (like Google Shopping and AdSense), each with its own configuration and limits. For Singapore SMEs—especially those running content-heavy sites, ecommerce catalogues, or JavaScript-heavy landing pages—this matters because you can’t rank content Google never receives.

This post is part of our AI Business Tools Singapore series, where we look at how modern tools (including AI) help businesses grow. Technical SEO doesn’t sound glamorous, but it’s the plumbing that makes every other channel work—content marketing, lead gen, even AI search visibility.

What Google just clarified about Googlebot (and why SMEs should care)

Googlebot is a client of a centralized crawling platform, and HTML fetching has a 2 MB cap per URL. That means your server logs and your SEO tools may show multiple “Google” crawlers behaving differently because they’re different clients on the same underlying system.

Here’s the practical impact for an SME: if your site is built on templates that bloat HTML (giant menus, huge inline scripts, base64 images, endless tracking snippets), you can unknowingly push key on-page SEO elements—like structured data or even core content—past the cutoff.

Two important numbers from Google’s clarification:

  • HTML pages: Googlebot fetches up to 2 MB per URL.
  • PDFs: Googlebot fetches up to 64 MB.
  • Other crawlers: if they don’t specify, the platform defaults to 15 MB.

That “15 MB default” is why some teams get confused when comparing crawl behavior across different Google agents.

The byte limit isn’t theoretical—it’s a hard cutoff

Googlebot doesn’t “reject” a page above 2 MB. It stops fetching at the cutoff and sends the truncated content onward to Google’s indexing systems and the Web Rendering Service (WRS). Those systems treat what they receive as complete.

If your important content sits after the 2 MB point, Google won’t see it. Not later. Not eventually. It’s simply not there.

For lead generation sites, this is a silent killer. I’ve seen service pages where the first screen is a massive hero section, then three libraries of inline CSS/JS, then the actual service copy and FAQs much further down. Humans scroll; Google may never get that far.

How Google’s crawling architecture actually works (plain English)

Google’s crawling is centralized. Google Search (Googlebot) is one client. Other Google products can crawl too, with different names and settings.

This matters in two ways:

  1. Server logs can mislead you. You might see “Google” crawling and assume it’s all Search, but Shopping or AdSense crawlers may show up differently.
  2. Different Google crawlers have different limits and rules. So “Google crawled it” doesn’t always mean “Google Search indexed it.”

For Singapore SMEs, the fix isn’t to obsess over which crawler hit you last. The fix is simpler: make your pages efficient to fetch and clear to interpret.

The underrated detail: HTTP headers count toward 2 MB

Google clarified that HTTP request headers count toward the 2 MB limit. Most SMEs won’t hit this, but if your stack adds lots of cookies, long security headers, or bloated response headers, it eats into the budget.

If your pages are already close to the limit, headers can be the straw that breaks indexing.

Rendering: what Google’s Web Rendering Service (WRS) will and won’t do

After fetching HTML, Google may render the page using WRS, which executes JavaScript to better understand content and structure.

Google’s clarifications are useful because they remove common assumptions:

WRS fetches some resources with separate byte counters

Each external resource referenced in the HTML (like CSS and JavaScript files) is fetched separately, each with its own byte counter. Those external files do not count toward the HTML page’s 2 MB cap.

This is why Google recommends moving heavy inline CSS/JS into external files.

WRS doesn’t fetch everything

WRS will pull in:

  • JavaScript
  • CSS
  • XHR requests (common in apps)

But it generally doesn’t request images or videos, and Google notes that media files, fonts, and some “exotic” files are not fetched by WRS.

For SMEs, the implication is straightforward:

  • If your “real content” is inside images (price lists, brochures, posters), Google may not interpret it as well as you think.
  • If your key copy only appears after a client-side interaction (tabs, accordions powered by JS with delayed fetch), you’re adding risk.

WRS is stateless

Google also highlighted that WRS works statelessly, clearing local storage and session data between requests.

If your site relies on:

  • session-based content
  • client-side personalization
  • “must accept cookies to see content” flows

…Google may render an incomplete experience.

Practical fixes: how to keep critical SEO content inside the first 2 MB

Your goal is not “make pages small.” Your goal is “make the first 2 MB count.” In practice, most SME pages will never approach 2 MB unless something is wrong in the template or build.

1) Put SEO-critical elements early in the HTML

Google explicitly recommends that on large pages, you should place these high in the HTML:

  • <title>
  • meta description
  • canonical tag
  • robots meta
  • structured data (JSON-LD)
  • internal linking modules that matter

Why? Because anything “too low” risks landing after the cutoff.

A simple rule I use: If it affects how Google understands the page, it belongs near the top.

2) Stop stuffing base64 images into HTML

Inline base64 images can balloon HTML size fast.

Better approach:

  • Serve images as normal files (WebP/AVIF where supported)
  • Lazy-load below-the-fold images
  • Compress aggressively for mobile

Even if images don’t count toward WRS in the same way, base64 embedded in HTML counts toward the 2 MB HTML cap.

3) Move inline CSS/JS out of templates

Inline CSS/JS is a common culprit on WordPress, Shopify custom themes, and “builder” sites.

Do this instead:

  • Externalize large CSS/JS into files
  • Minify
  • Remove unused CSS (especially from page builders)
  • Defer non-critical JavaScript

This isn’t about PageSpeed scores. It’s about crawl efficiency and indexing completeness.

4) Audit “mega menus” and header bloat

Oversized navigation can push every page closer to the limit.

If your header contains:

  • hundreds of links
  • large inline SVGs
  • embedded scripts

…you’re repeating that bloat across the whole site.

For SMEs, this is an easy win: simplify navigation, and move deep links into category pages or HTML sitemaps.

5) Reduce tracking and tag clutter

Singapore SMEs often accumulate marketing tags over time (Meta Pixel, Google Ads, TikTok, Hotjar, multiple analytics containers). Each snippet isn’t huge—but together they add up.

Keep what you use. Remove what you don’t. If you need a quick governance rule: every script must have an owner and a business purpose.

A quick SME example: why this shows up during campaigns

Here’s a pattern I see during high-intent campaigns (common around seasonal pushes like mid-year promos, year-end, or major product launches):

  • The marketing team spins up new landing pages fast.
  • A page builder is used.
  • Sections are duplicated.
  • Inline styling grows out of control.
  • Multiple tracking scripts are added “just for this campaign.”

The page converts fine from ads, but organic traffic never arrives. Search Console shows crawling, but rankings don’t move.

When you check the fetched HTML size, it’s suddenly massive—and the FAQ block with structured data is appended near the bottom, sometimes past 2 MB.

Result: the page is technically “accessible,” but Google indexes a chopped version.

This is where AI business tools can help in a very non-hype way: use AI to automatically review templates, flag bloated sections, and generate a cleanup checklist before pages go live.

FAQ: common questions SMEs ask about the 2 MB Googlebot limit

“Is 2 MB too small for modern websites?”

No. Most well-built pages are far below it. When a page hits 2 MB of HTML, it’s almost always because of inline bloat, not because the business “has too much content.”

“Do my CSS and JS count toward the 2 MB?”

External CSS/JS files are fetched separately with their own byte counters. But inline CSS/JS inside the HTML counts toward the 2 MB HTML limit.

“If Google truncates my page, will it still rank?”

It can, but it’ll rank based on what Google received. If your main content, internal links, or structured data is cut off, you’re ranking with one hand tied behind your back.

“Does this affect Local SEO in Singapore?”

Indirectly, yes. Local SEO pages (service + location pages) often use heavy templates and repeated blocks. If the location-specific content is low on the page, truncation can reduce relevance for queries like “aircon servicing bukit timah” or “accounting firm jurong.”

What to do this week (a practical checklist)

If you want a fast technical SEO win, measure and fix the biggest offenders first.

  1. Export your top 20 lead-gen pages and top 20 organic landing pages.
  2. Check HTML response size (and watch for builder bloat).
  3. Ensure titles, canonicals, structured data, and primary copy appear early.
  4. Remove base64 images and large inline scripts.
  5. Simplify navigation and template components sitewide.

If you only do one thing: make sure your primary content and structured data appear well before the first 2 MB of HTML. That’s the difference between “Google crawled it” and “Google understood it.”

Google also noted the 2 MB cap may change over time. Don’t build strategies around a single number. Build pages that are efficient, readable, and robust—because that’s what keeps working as search (and AI-driven search experiences) evolve.

If you’re investing in AI tools for marketing in Singapore, pair them with solid technical SEO. AI can help you publish faster, test more, and personalize better—but Google still has to fetch and process the page first. What part of your site would you bet is closest to that 2 MB cutoff?

🇸🇬 Googlebot’s 2MB Limit: Fix Crawl & Indexing Issues - Singapore | 3L3C