How to scrape Google AI Mode responses with minimal effort
Google AI Mode represents the cutting edge of AI-powered search with sophisticated citation systems, multi-source synthesis, and advanced response generation that far exceeds traditional AI model capabilities.
The challenge: Google AI Mode wasn’t built for programmatic access. The platform uses complex network interception patterns, embedded citation metadata, and dynamic response loading that standard scraping tools can’t handle.
After analyzing thousands of Google AI Mode interactions, we’ve reverse-engineered the complete process. This guide will show you exactly how to scrape Google AI Mode and extract the structured data that makes it valuable for AI researchers and businesses.
Table of contents
- Why scrape Google AI Mode responses?
- Understanding Google AI Mode’s architecture
- The citation pill parsing challenge
- Building the scraping infrastructure
- Parsing AI Mode responses and citations
- Extracting structured data from responses
- Handling network interception and async responses
- Using cloro’s managed Google AI Mode scraper
Why scrape Google AI Mode responses?
Google AI Mode provides unique AI-generated content that isn’t available through any other interface.
What makes Google AI Mode responses valuable:
- The actual AI-generated content with advanced formatting and structure
- Sophisticated citation pill system with embedded metadata and source links
- HTML comment-based source attribution that reveals content sourcing
- Dynamic response loading with real-time context and updates
- Multi-format output (text, markdown, HTML) with rich metadata
Why it matters: Google AI Mode responses represent a unique approach to AI-generated search results that can’t be accessed through traditional search APIs or other interfaces.
Use cases:
- AI Research: Study citation patterns and source attribution
- Content Analysis: Analyze AI-generated content structure
- SEO Intelligence: Understand how AI Mode sources information
- Compliance Monitoring: Track AI response quality and accuracy
Compare this with scraping Google Gemini to understand the differences in Google’s AI implementations.
Understanding Google AI Mode’s architecture
Google AI Mode uses a sophisticated multi-layered architecture that makes scraping challenging:
Request Flow
- Initial request: User searches with
udm=50parameter for AI Mode - Response routing: Specialized AI Mode processing pipeline
- Async response loading:
/async/folwrendpoint for dynamic content - Citation pill generation: Embedded HTML comments with source metadata
Response Structure
Google AI Mode returns complex content with multiple data types:
- AI Response Text: Main AI-generated content with citations
- Citation Pills: Interactive buttons with embedded source links
- HTML Comments: Structured citation metadata in comments
- Dynamic Sources: Different selectors for various page layouts
- Multi-format Output: Text, markdown, and HTML representations
Technical Challenges
- Async Response Loading: Responses loaded via
/async/folwrendpoint - Citation Pill Metadata: Embedded in HTML comments with complex parsing
- Dynamic Selectors: Different layouts for web results vs normal pages
- Session Persistence: Cookie-based session management
- Anti-bot Detection: Advanced behavioral analysis
The citation pill parsing challenge
Google AI Mode’s citation system is uniquely sophisticated, requiring specialized parsing techniques:
Citation Pill Architecture
HTML Comment Embedding:
<!--Sv6Kpe[["uuid-12345",["label","description"],["https://example.com","source2"]]]-->
<button data-icl-uuid="uuid-12345" data-amic="true">[1]</button>
Multi-source Citations:
- Single citation pills can reference multiple URLs
- UUID-based linking between pills and metadata
- HTML comment parsing for source extraction
- Google URL filtering and cleanup
Complex Metadata Extraction
UUID-Based Mapping:
# Extract citation pills with UUID mapping
pill_locators = page.locator('button[data-icl-uuid][data-amic="true"]')
# Parse HTML comments for citation metadata
pattern = r'<!--Sv6Kpe\[\["{uuid}".*?]]-->'
comment_blocks = re.findall(pattern, page_html, re.DOTALL)
Source URL Processing:
- Filter out Google internal URLs
- Clean URL encoding and fragments
- Handle multiple sources per citation
- Extract descriptions and labels
Building the scraping infrastructure
Here’s the complete infrastructure needed for reliable Google AI Mode scraping:
Core Components
import asyncio
from playwright.async_api import Page, Browser
from services.cookie_stash import cookie_stash
from services.page_interceptor import PlaywrightInterceptor
from services.captchas.solve import solve_captcha
from bs4 import BeautifulSoup
import html2text
AIMODE_URL = "https://www.google.com/search"
Request Configuration
class AIModeRequest(TypedDict):
prompt: str # AI Mode query
country: str # Country code
include: Dict[str, bool] # Content options (markdown, html)
URL Construction with AI Mode Parameters
# AI Mode requires specific URL parameters
search_url = build_url_with_params(
AIMODE_URL,
{
"udm": 50, # Enable AI Mode
"aep": 11, # Additional AI Mode parameter
"q": prompt, # Search query
"hl": google_params["hl"], # Language
"gl": google_params["gl"], # Country
},
)
Network Interception Setup
# AI Mode uses async response loading
page_interceptor = PlaywrightInterceptor(do_not_block_resources=True)
page_interceptor.add_capture_urls(["https://www.google.com/async/folwr"])
# Wait for async response (up to 60 seconds)
for _ in range(120):
if len(page_interceptor.captured_responses):
break
await sleep(500)
else:
raise Exception("Never received AI Mode response after 60 seconds")
Parsing AI Mode responses and citations
Google AI Mode requires sophisticated parsing due to its unique citation system:
Text Extraction
async def extract_aimode_text(page: Page) -> str:
"""Extract text content from AI Mode response."""
try:
# Find element with data-session-thread-id
thread_element = page.locator("[data-session-thread-id]")
# Get parent div's text content
parent_div = thread_element.locator("..")
text = await parent_div.text_content() or ""
return text.strip()
except Exception as e:
logger.warning(f"Could not extract text: {e}")
return ""
Citation Pill Extraction
async def extract_aimode_citation_pills(page: Page) -> Dict[str, List[LinkData]]:
"""Extract citation pills with embedded metadata."""
citation_pills: Dict[str, List[LinkData]] = {}
# Find all citation buttons
pill_locators = page.locator('button[data-icl-uuid][data-amic="true"]')
pill_count = await pill_locators.count()
# Get page HTML for comment parsing
page_html = await page.content()
page_html = html.unescape(page_html)
for i in range(pill_count):
pill_button = pill_locators.nth(i)
if not await pill_button.is_visible():
continue
uuid = await pill_button.get_attribute("data-icl-uuid")
if not uuid:
continue
# Extract citation metadata from HTML comments
pattern = rf'<!--Sv6Kpe\[\["{re.escape(uuid)}".*?]]-->'
comment_blocks = re.findall(pattern, page_html, re.DOTALL)
current_pill: List[LinkData] = []
for content in comment_blocks:
# Extract description
desc_match = re.search(
rf'"{re.escape(uuid)}"\s*,\s*\[\s*"[^"]+"\s*,\s*"([^"]+)"',
content,
)
description = desc_match.group(1) if desc_match else None
# Extract all URLs, filter out Google internal
all_urls = re.findall(r'"(https://[^"]+)"', content)
url = None
for potential_url in all_urls:
if not any(skip in potential_url
for skip in ["google.com", "gstatic.com", "encrypted-tbn"]):
url = potential_url
break
if url:
# Clean up URL
if "#:~:text" in url:
url = url.split("#:~:text")[0]
url = url.replace("\\u003d", "=").replace("\\u0026", "&")
current_pill.append(LinkData(
position=len(current_pill) + 1,
label=f"Source {len(current_pill) + 1}",
url=url,
description=description,
))
if current_pill:
citation_pills[uuid] = current_pill
return citation_pills
Source Link Extraction
async def extract_aimode_sources(
page: Page, is_web_results_page: bool = False
) -> List[LinkData]:
"""Extract source links with different selectors for page types."""
# Different selectors for different page layouts
sources_selector = (
'[data-container-id="rhs-col"] [role="dialog"] a'
if not is_web_results_page
else "a.ZbQNgf"
)
sources: List[LinkData] = []
try:
await page.wait_for_selector(sources_selector, timeout=10_000, state="attached")
sources_locator = page.locator(sources_selector)
source_elements = await sources_locator.all()
for position, element in enumerate(source_elements, start=1):
url = await element.get_attribute("href")
label = await element.get_attribute("aria-label")
if url and label:
sources.append(LinkData(
position=position,
label=label,
url=url,
description=None,
))
except Exception as e:
logger.warning(f"Could not extract sources: {e}")
return sources
Extracting structured data from responses
Google AI Mode supports multiple output formats for different use cases:
HTML to Markdown Conversion
def convert_aimode_html_to_markdown(
html_content: str, citation_pills: Dict[str, List[LinkData]]
) -> str:
"""Convert AI Mode HTML to markdown with proper citation links."""
if not html_content:
return ""
soup = BeautifulSoup(html_content, "html.parser")
# Find citation buttons
buttons = soup.find_all("button", attrs={"data-icl-uuid": True, "data-amic": "true"})
for button in buttons:
if not isinstance(button, Tag):
continue
uuid = button.get("data-icl-uuid")
if not isinstance(uuid, str):
continue
pill_links = citation_pills.get(uuid, [])
# Replace citation buttons with actual links
new_anchors: List = []
for _, link_data in enumerate(pill_links):
source_text = link_data.get("label")
url = link_data.get("url")
new_anchor = soup.new_tag("a", href=url)
new_anchor.string = source_text
new_anchors.append(new_anchor)
# Insert links and remove button
for anchor in reversed(new_anchors):
button.insert_after(anchor)
button.decompose()
# Convert to markdown
h = html2text.HTML2Text()
h.ignore_links = False
h.ignore_images = False
h.body_width = 0
h.unicode_snob = True
markdown = h.handle(str(soup))
return markdown.strip()
Response Processing Pipeline
async def parse_aimode_response(
page: Page, request_data: ScrapeRequest
) -> ScrapeAiModeResult:
"""Complete response processing pipeline."""
include_markdown = request_data.get("include", {}).get("markdown", False)
include_html = request_data.get("include", {}).get("html", False)
# Detect page type
is_web_results_page = bool(await page.locator(".RbCUdc").count())
# Extract core content
text = await extract_aimode_text(page)
sources = await extract_aimode_sources(page, is_web_results_page=is_web_results_page)
if not len(sources):
raise Exception("no sources")
# Extract citation metadata
citations = await extract_aimode_citation_pills(page)
result: ScrapeAiModeResult = {
"text": text,
"sources": sources,
}
# Optional markdown conversion
if include_markdown:
ai_mode_html = await extract_aimode_html(page)
markdown = convert_aimode_html_to_markdown(ai_mode_html, citations)
result["markdown"] = markdown
# Optional HTML upload
if include_html:
result["html"] = await upload_html(
request_data["requestId"], await page.content()
)
return result
Handling network interception and async responses
Google AI Mode uses sophisticated async response loading that requires special handling:
Async Response Capture
# Set up network interception for async responses
page_interceptor = PlaywrightInterceptor(do_not_block_resources=True)
page_interceptor.add_capture_urls(["https://www.google.com/async/folwr"])
# Configure page interceptor
await page_interceptor.setup_page_interceptor(page)
Response Timeout Handling
# Wait for async response with timeout
async_response_received = False
for attempt in range(120): # 60 seconds max wait
if len(page_interceptor.captured_responses):
async_response_received = True
break
await sleep(500) # 500ms intervals
else:
raise Exception("Never received AI Mode response after 60 seconds")
if async_response_received:
logger.info("Async AI Mode response captured successfully")
Error Detection and Recovery
# HTTP error handling with CAPTCHA detection
response = await page.goto(search_url, timeout=20_000)
if response is None:
raise Exception("Navigation failed - no response received")
if not is_http_success(response.status):
# Handle potential CAPTCHA
solved_captcha = await solve_captcha(page, page_interceptor)
metadata["solved_captcha"] = solved_captcha
if not solved_captcha:
raise Exception(f"HTTP error: {response.status} (probably captcha)")
Using cloro’s managed Google AI Mode scraper
Building and maintaining a reliable Google AI Mode scraper requires significant engineering resources:
Infrastructure Requirements
AI Mode-Specific Challenges:
- Async response interception and parsing
- HTML comment metadata extraction
- Citation pill UUID mapping
- Multi-format output generation
- Complex session management
Anti-Bot Evasion:
- Browser fingerprinting rotation
- CAPTCHA solving integration
- Proxy pool management
- Rate limiting and backoff strategies
- Behavioral pattern simulation
Performance Optimization:
- Async response handling
- Efficient HTML parsing
- Multi-format conversion pipelines
- Error handling and recovery
- Geographic distribution
Managed Solution API
import requests
# Simple API call - no browser management needed
response = requests.post(
"https://api.cloro.dev/v1/monitor/ai-mode",
headers={
"Authorization": "Bearer sk_live_your_api_key",
"Content-Type": "application/json"
},
json={
"prompt": "What do you know about Tesla's latest updates?",
"country": "US",
"include": {
"markdown": True
}
}
)
result = response.json()
print(f"AI Response: {result['result']['text'][:100]}...")
print(f"Sources: {len(result['result']['sources'])} citations found")
print(f"Markdown: {'Yes' if result['result'].get('markdown') else 'No'}")
Response Structure
{
"success": true,
"result": {
"text": "Tesla's recent updates include significant improvements to their Full Self-Driving capability...",
"sources": [
{
"position": 1,
"label": "Tesla FSD Updates",
"url": "https://tesla.com/updates/fsd",
"description": "Latest Full Self-Driving improvements and capabilities"
}
],
"html": "https://storage.googleapis.com/ai-mode-response.html",
"markdown": "**Tesla's recent updates** include significant improvements...",
"searchQueries": ["Tesla updates 2024", "Full Self Driving improvements"]
}
}
Key Benefits
- P50 latency < 8s vs. manual scraping that takes minutes
- No infrastructure costs - we handle browsers, proxies, and async interception
- Structured data - automatic citation pill parsing and metadata extraction
- Multi-format output - text, markdown, and HTML with proper citation links
- Compliance - ethical scraping practices and rate limiting
- Scalability - handle thousands of requests without breaking AI Mode’s terms
Start scraping Google AI Mode today.
The insights from Google AI Mode data are too valuable to ignore. Whether you’re an AI researcher studying citation patterns, a content developer analyzing AI-generated content, or a business monitoring AI response quality, access to structured Google AI Mode data provides incredible opportunities.
For most developers and businesses, we recommend using cloro’s Google AI Mode scraper. You get:
- Immediate access to reliable scraping infrastructure
- Automatic citation pill parsing and metadata extraction
- Built-in async response handling and network interception
- Comprehensive error handling and CAPTCHA solving
- Structured JSON output with all citation metadata
- Multi-format support (text, markdown, HTML)
The cost of building and maintaining this infrastructure yourself typically runs $5,000-10,000/month in development time, browser instances, proxy services, and async response handling.
For advanced users needing custom solutions, the technical approach outlined above provides the foundation for building your own scraping system. Be prepared for ongoing maintenance as Google frequently updates its AI Mode response formats and citation systems.
The window of opportunity is closing. As more businesses discover the value of AI intelligence, competition for understanding AI behavior intensifies. Companies that start monitoring and analyzing AI Mode responses now will build advantages that become increasingly difficult to overcome.
Ready to unlock Google AI Mode data for your business? Get started with cloro’s API to start accessing advanced AI-generated search results.
Don’t let your competitors miss out on AI intelligence. Start scraping Google AI Mode today.