cloro
Technical Guides

How to scrape Google Search results with minimal infrastructure

#Google#Scraping#SEO

Google processes over 8.5 billion searches daily. Behind the scenes, the search engine delivers rich structured data including organic results, AI Overviews, People Also Ask sections, and related searches that traditional APIs can’t access.

The challenge: Google wasn’t built for programmatic access. The platform uses sophisticated anti-bot systems, dynamic content rendering, and geolocation-based personalization that traditional scraping tools can’t handle.

After analyzing millions of Google Search results, we’ve reverse-engineered the complete process. This guide will show you exactly how to scrape Google Search and extract the structured data that makes it valuable for SEO professionals and researchers.

Table of contents

Why scrape Google Search results?

Google Search API responses are nothing like what users see in the UI.

What you miss with the API:

  • The actual search results users see
  • AI Overview with source citations
  • People Also Ask questions with related content
  • Related searches for query expansion
  • Location-based result personalization

Why it matters: API responses are nothing like the UI, making it impossible to monitor rankings or understand user experience without scraping.

The math: Scraping costs up to 12x less than direct API usage while providing the real user experience.

Use cases:

  • SEO monitoring: Track your search rankings across locations
  • Competitive analysis: Monitor competitor visibility and strategies
  • Market research: Analyze search trends and user intent
  • Brand monitoring: Track how your brand appears in search results

For more specific AI features, see our guide on scraping Google AI Overview.

Understanding Google Search’s architecture

Google Search uses a sophisticated multi-layered architecture that makes scraping challenging:

Request Flow

  1. Initial request: User searches via google.com
  2. Geolocation routing: Results personalized based on UULE parameters
  3. Dynamic rendering: JavaScript loads organic results, AI Overview, and related content
  4. Anti-bot checks: Multiple layers of bot detection and CAPTCHA challenges

Response Structure

Google Search returns complex HTML with multiple data sections:

  • Organic results: Traditional blue links with titles and snippets
  • AI Overview: AI-generated summaries with source citations
  • People Also Ask: Related questions with expandable answers
  • Related searches: Suggested query variations
  • Knowledge panels: Entity information and rich media

Technical Challenges

  • Location-based results: UULE parameters for precise geotargeting
  • Dynamic content: JavaScript rendering for interactive features
  • Anti-bot detection: Canvas fingerprinting and behavioral analysis
  • CAPTCHA challenges: reCAPTCHA integration for suspicious activity. Learn how to solve CAPTCHAs efficiently.

The structured data extraction challenge

Google Search presents unique parsing challenges due to its complex HTML structure and dynamic content loading. You might even face IP bans requiring you to unblock websites using proxies.

Multi-format Data Sources

Organic Results: Traditional search results with position tracking

# Desktop selector example
".A6K0A .vt6azd.asEBEc"  # Main result container

# Mobile selector example
"[data-dsrp]"  # Mobile result container

AI Overview: AI-generated content with source attribution

# AI Overview container variations
"#m-x-content [data-container-id='main-col']"  # Version 1
"#m-x-content [data-rl]"  # Version 2

Related Features: People Also Ask and Related Searches

# People Also Ask questions
"[jsname='Cpkphb']"  # Question containers

# Related searches
".s75CSd"  # Related search suggestions

Dynamic Selectors

Google uses different HTML structures based on:

  • Device type: Desktop vs mobile layouts
  • Search type: Web, news, images, or shopping
  • Feature availability: AI Overview, knowledge panels
  • Location: Country-specific result formats

Content Rendering Modes

HTTP Fetch Mode: Direct HTTP requests for basic organic results

  • Faster and more resource-efficient
  • Limited to static HTML content
  • May trigger bot detection

Browser Rendering Mode: Full browser automation for dynamic content

  • Supports AI Overview and interactive features
  • Handles JavaScript-heavy content
  • Higher resource usage but more comprehensive

Building the scraping infrastructure

Here’s the complete infrastructure needed for reliable Google Search scraping:

Core Components

import asyncio
import uule_grabber
from playwright.async_api import Page, Browser
from services.cookie_stash import cookie_stash
from services.page_interceptor import PlaywrightInterceptor
from services.captchas.solve import solve_captcha
from bs4 import BeautifulSoup

GOOGLE_COM_URL = "https://www.google.com/search"
ACCEPT_COOKIES_LOCATOR = "#L2AGLb"
MIN_HEAD_CHARS = 500

Request Configuration

class GoogleRequest(TypedDict):
    prompt: str  # Search query
    city: Optional[str]  # Location targeting
    country: str  # Country code
    pages: int  # Number of result pages
    device: Literal["desktop", "mobile"]  # Device type
    include: Dict[str, bool]  # Content options

Session Management

# Cookie persistence for better success rates
existing_cookies = await cookie_stash.get_cookies(
    proxy.ip, GOOGLE_COM_URL, device_type=device_type
)

if existing_cookies:
    # Remove UULE cookie to avoid conflicts
    filtered_cookies = [
        c for c in existing_cookies if c.get("name") != "UULE"
    ]
    await page.context.add_cookies(filtered_cookies)

Geolocation Support

# UULE parameter for precise location targeting
uule = None
if city:
    uule = uule_grabber.uule(city)

# Build search URL with location parameters
search_url = build_url_with_params(
    GOOGLE_COM_URL,
    {
        "q": prompt,
        "hl": google_params["hl"],  # Language
        "gl": google_params["gl"] if not uule else None,  # Country
        "uule": (uule, False),  # Location
    },
)

Parsing organic search results

Google organic results require careful parsing due to different layouts for desktop and mobile:

Desktop vs Mobile Selectors

SELECTORS = {
    "desktop": {
        "item": ".A6K0A .vt6azd.asEBEc",
        "title": "h3",
        "link": "a",
        "displayed_link": "cite",
        "snippet": "div.VwiC3b",
    },
    "mobile": {
        "item": "[data-dsrp]",
        "title": ".GkAmnd",
        "link": "a[role='presentation']",
        "displayed_link": ".nC62wb",
        "snippet": "div.VwiC3b",
    },
}

Organic Result Parsing

def parse_organic_results(
    html: str,
    current_page: int,
    device_type: Literal["desktop", "mobile"],
    init_position: int,
) -> List[OrganicResult]:
    """Parse organic search results from Google HTML."""
    organic_results = []
    position = init_position
    soup = BeautifulSoup(html, "html.parser")

    selectors = SELECTORS[device_type]
    elements = soup.select(selectors["item"])

    for elem in elements:
        # Extract title
        title_element = elem.select_one(selectors["title"])
        if not title_element:
            continue
        title = title_element.get_text(strip=True)

        # Extract link
        link_element = elem.select_one(selectors["link"])
        if not link_element:
            continue
        link = link_element.get("href")

        # Extract displayed link
        displayed_link_element = elem.select_one(selectors["displayed_link"])
        displayed_link = (
            displayed_link_element.get_text(strip=True)
            if displayed_link_element
            else ""
        )

        # Extract snippet
        snippet = ""
        snippet_element = elem.select_one(selectors["snippet"])
        if snippet_element:
            snippet = snippet_element.get_text(strip=True)

        result: OrganicResult = {
            "position": position,
            "title": title,
            "link": link,
            "displayedLink": displayed_link.replace(" › ", " > "),
            "snippet": snippet,
            "page": current_page,
        }
        organic_results.append(result)
        position += 1

    return organic_results

Google’s AI Overview provides unique challenges with its dynamic loading and multiple layout versions.

AI Overview Detection and Parsing

from typing import List
from playwright.async_api import Locator, Page

# AI Overview container locators
SV6KPE_LOCATOR = "#m-x-content [data-container-id='main-col']"
NON_SV6KPE_LOCATOR = "#m-x-content [data-rl]"
MAIN_COL_LOCATOR = f"{SV6KPE_LOCATOR}, {NON_SV6KPE_LOCATOR}"

async def wait_for_ai_overview(page: Page) -> str:
    """Wait for AI Overview to load and return the selector found."""
    try:
        # Wait for either AI Overview version
        await page.wait_for_selector(
            MAIN_COL_LOCATOR, timeout=10_000
        )

        # Check which version loaded
        if await page.locator(SV6KPE_LOCATOR).count() > 0:
            return SV6KPE_LOCATOR
        else:
            return NON_SV6KPE_LOCATOR
    except Exception:
        return ""

AI Overview Source Extraction

async def extract_aioverview_sources(page: Page) -> List[LinkData]:
    """Extract sources from AI Overview."""
    sources = []
    seen_urls = set()
    position = 1

    # AI Overview sources selector
    AI_OVERVIEW_SOURCES_LOCATOR = "#m-x-content ul > li > a, #m-x-content ul > li > div > a"

    aioverview_sources = await page.locator(AI_OVERVIEW_SOURCES_LOCATOR).all()
    for source_elem in aioverview_sources:
        url = await source_elem.get_attribute("href")
        label = await source_elem.get_attribute("aria-label")

        if url and label and url not in seen_urls:
            description = await _extract_aioverview_source_description(source_elem)

            source = LinkData(
                position=position,
                label=label,
                url=url,
                description=description or "",
            )
            sources.append(source)
            seen_urls.add(url)
            position += 1

    return sources

async def _extract_aioverview_source_description(element: Locator) -> str | None:
    """Extract description for AI Overview source."""
    try:
        parent = element.locator("xpath=..")
        description_div = parent.locator(".gxZfx").first
        return await description_div.inner_text(timeout=1000)
    except Exception:
        pass

    return None
def parse_people_also_ask(html: str) -> List[PeopleAlsoAskResult]:
    """Parse People Also Ask section from Google HTML."""
    people_also_ask = []
    soup = BeautifulSoup(html, "html.parser")

    # People Also Ask questions
    question_elements = soup.select("[jsname='Cpkphb']")

    for elem in question_elements:
        question = elem.get_text(strip=True)
        if question:
            result: PeopleAlsoAskResult = {
                "question": question,
                "type": "UNKNOWN",  # Could be expanded with link detection
            }
            people_also_ask.append(result)

    return people_also_ask

def parse_related_searches(
    html: str,
    search_url: str,
    device_type: Literal["desktop", "mobile"],
) -> List[RelatedSearchResult]:
    """Parse related searches from Google HTML."""
    related_searches = []
    soup = BeautifulSoup(html, "html.parser")

    # Related searches selector
    related_elements = soup.select(".s75CSd")

    for elem in related_elements:
        query = elem.get_text(strip=True)
        if query:
            result: RelatedSearchResult = {
                "query": query,
                "link": None,  # Could be constructed from query
            }
            related_searches.append(result)

    return related_searches

Handling geolocation and pagination

Google Search results vary significantly based on location and require proper pagination handling.

Geolocation Targeting

import uule_grabber

def get_location_parameters(city: str, country: str) -> Dict[str, str]:
    """Get location-specific parameters for Google Search."""
    params = {
        "hl": "en",  # Default language
        "gl": country.upper(),  # Country code
    }

    # Add UULE for city-level targeting
    if city:
        uule = uule_grabber.uule(city)
        if uule:
            params["uule"] = uule
            # Remove gl when using UULE for precision
            del params["gl"]

    return params

Multi-page Support

async def scrape_multiple_pages(
    page: Page,
    prompt: str,
    n_pages: int,
    google_params: Dict[str, str],
    uule: Optional[str],
    device_type: str,
) -> Tuple[List[OrganicResult], List[str]]:
    """Scrape multiple pages of Google Search results."""
    organic_results = []
    html_pages = []

    for current_page in range(1, n_pages + 1):
        if current_page == 1:
            # First page - use existing page content
            html = await page.content()
        else:
            # Build next page URL
            next_page_params = {
                **google_params,
                "q": prompt,
                "start": (current_page - 1) * 10,  # Google uses 0-indexed
            }

            if uule:
                next_page_params["uule"] = uule

            next_page_url = build_url_with_params(
                GOOGLE_COM_URL, next_page_params
            )

            # Fetch next page
            response = await page.context.request.fetch(next_page_url)
            html = await response.text()

        # Parse organic results for current page
        page_results = parse_organic_results(
            html,
            current_page=current_page,
            device_type=device_type,
            init_position=len(organic_results) + 1,
        )

        organic_results.extend(page_results)
        html_pages.append(html)

    return organic_results, html_pages

Using cloro’s managed Google Search scraper

Building and maintaining a reliable Google Search scraper requires significant engineering resources:

Infrastructure Requirements

Anti-Bot Evasion:

  • Browser fingerprinting rotation
  • CAPTCHA solving services
  • Proxy pool management
  • Rate limiting and backoff strategies

Performance Optimization:

  • Geographic proxy distribution
  • Cookie persistence systems
  • Parallel request processing
  • Error handling and retry logic

Maintenance Overhead:

  • Continuous selector updates
  • Anti-bot measure adaptation
  • Performance monitoring
  • Compliance management

Managed Solution API

import requests

# Simple API call - no browser management needed
response = requests.post(
    "https://api.cloro.dev/v1/monitor/google-search",
    headers={
        "Authorization": "Bearer sk_live_your_api_key",
        "Content-Type": "application/json"
    },
    json={
        "query": "best coffee shops",
        "city": "New York",
        "country": "US",
        "pages": 3,
        "include": {
            "html": True,
            "aioverview": True
        }
    }
)

result = response.json()
print(f"Found {len(result['result']['organicResults'])} organic results")
print(f"AI Overview: {'Yes' if result['result'].get('aioverview') else 'No'}")

Response Structure

{
  "success": true,
  "result": {
    "organicResults": [
      {
        "position": 1,
        "title": "Best Coffee Shops in NYC 2024",
        "link": "https://example.com/coffee-shops",
        "displayedLink": "example.com",
        "snippet": "Guide to New York's best coffee shops...",
        "page": 1
      }
    ],
    "peopleAlsoAsk": [
      {
        "question": "What is the most famous coffee shop in NYC?",
        "type": "LINK",
        "title": "Iconic NYC Coffee Shops",
        "link": "https://example.com/iconic-coffee"
      }
    ],
    "relatedSearches": [
      {
        "query": "best coffee shops brooklyn",
        "link": "https://google.com/search?q=best+coffee+shops+brooklyn"
      }
    ],
    "aioverview": {
      "text": "New York City has a vibrant coffee culture...",
      "sources": [
        {
          "position": 1,
          "label": "NYC Coffee Guide 2024",
          "url": "https://example.com/nyc-coffee",
          "description": "Comprehensive guide to NYC coffee scene"
        }
      ]
    }
  }
}

Key Benefits

  • P50 latency < 8s vs. manual scraping that takes minutes per query
  • No infrastructure costs - we handle browsers, proxies, and maintenance
  • Structured data - automatic parsing of organic results, AI Overview, and related features
  • Compliance - ethical scraping practices and rate limiting
  • Scalability - handle thousands of requests without breaking Google’s terms

Start scraping Google Search today.

The insights from Google Search data are too valuable to ignore. Whether you’re an SEO professional tracking rankings, a business monitoring competitive landscape, or a researcher analyzing search trends, access to structured Google Search data provides incredible opportunities.

For most developers and businesses, we recommend using cloro’s Google Search scraper. You get:

  • Immediate access to reliable scraping infrastructure
  • Automatic data parsing and structuring
  • Built-in anti-bot evasion and rate limiting
  • Comprehensive error handling and retries
  • Structured JSON output with all metadata
  • Geolocation targeting and multi-page support

The cost of building and maintaining this infrastructure yourself typically runs $5,000-10,000/month in development time, browser instances, proxy services, and maintenance overhead.

For advanced users needing custom solutions, the technical approach outlined above provides the foundation for building your own scraping system. Be prepared for ongoing maintenance as Google frequently updates its anti-bot measures and result page layouts.

The window of opportunity is closing. As more businesses discover the value of search intelligence, competition for top positions intensifies. Companies that start monitoring and optimizing their search presence now will build advantages that become increasingly difficult to overcome.

Ready to unlock Google Search data for your business? Get started with cloro’s API to start accessing comprehensive search intelligence.

Don’t let your competitors dominate search results. Start scraping Google Search today.