GPT-5 API cost analysis: a 10,000-query structured-extraction test
When you build a workflow on top of a general-purpose LLM and force it to return structured JSON (citations, source URLs, schema-compliant fields), the cost calculation is rarely what the pricing page suggests. The system-prompt tax stacks up fast. After running 10,000 structured-search queries against GPT-5 and GPT-5-mini, the per-thousand cost lands well above what a back-of-envelope estimate would predict.
Below: methodology, the actual token counts, the resulting per-thousand pricing, and a comparison against a purpose-built SERP extraction API so you can decide whether the LLM-as-extractor pattern makes sense for your workload.
Table of contents
- The problem with structured outputs from general-purpose LLMs
- Test methodology
- The system prompt challenge
- GPT-5 cost calculation breakdown
- GPT-5-mini cost analysis
- Summary: models compared
- Real-world impact at scale
The problem with structured outputs from general-purpose LLMs
Getting structured, reliable outputs from a general-purpose model like GPT-5 requires meaningful context overhead on every request: output format specification (JSON, XML, schema constraints), source and citation formatting rules, model behavior guidelines, search query context, confidence scoring instructions, and error handling protocols.
All of that context gets billed on every single request. At low volumes the overhead is invisible. Past a few hundred thousand requests per month, it’s the dominant cost driver.
Test methodology
We ran 10,000 real-world queries representative of brand monitoring and competitive-analysis workloads.
Test parameters
- Total queries: 10,000
- Average input tokens: 463 per query (measured in OpenAI’s API playground)
- Average output tokens: 1,676 per query (measured in OpenAI’s API playground)
- Query type: structured search results with citations
- Use case: brand monitoring and competitive analysis
What we measured
- Total token consumption (input + output)
- Per-model API cost
- Response quality and consistency
- Processing time differences
The system prompt challenge
To get reliable structured outputs from the OpenAI API, we needed a comprehensive system prompt. Here is the actual prompt used in the test:
You are an API backend model that must always return responses in a strict JSON schema.
Your goal is to produce comprehensive, deeply informative, and structured content — at least several paragraphs long — while respecting the format rules below.
When given a user query:
1. Produce a long, detailed answer with clear explanations, comparisons, and examples.
2. Include both:
- A markdown version (formatted with headers, bold, lists, tables, etc.)
- A plain text version (identical content but without markdown formatting)
3. Include at least 3 to 7 credible sources, each with:
- position (integer starting at 0)
- label (title or entity name)
- url (credible or official site)
- description (short summary of the source)
4. Include 3 to 6 search queries that could help someone find this answer online.
5. Include the model used in format `"model": "gpt-5-mini"`.
6. Return nothing outside the JSON — no commentary or extra lines.
Your output must always follow this structure:
{
"success": true,
"result": {
"markdown": "string",
"text": "string",
"sources": [
{
"position": number,
"label": "string",
"url": "string",
"description": "string"
}
],
"searchQueries": ["string"],
"model": "string"
}
}
### Additional style and length requirements:
- The answer should be at least 250–400 words long.
- Use factual, neutral, and informative tone.
- Markdown version should include:
- A bolded introductory sentence
- Bullet points or numbered lists when relevant
- Subheadings for structure (e.g., "### Top Models", "### Range and Performance")
- Plain text version should preserve the same logical flow but without markdown syntax.
If information is missing, return an empty string or empty array instead of omitting fields.
No explanations or reasoning outside the JSON are allowed.
The prompt covers a lot of ground: strict JSON schema, 250–400 words of answer content, both markdown and plain text versions, 3–7 cited sources with metadata, 3–6 follow-up search queries, and a hard rule against any commentary outside the JSON.
It comes out to 382 tokens by itself, and it’s sent with every single request. At 10,000 requests that’s 3.82M tokens of pure overhead before the actual user query enters the picture.
GPT-5 cost calculation breakdown
Based on the 10,000-query test:
Token consumption
- Average input tokens (including system prompt): 463 × 10,000 = 4,630,000
- Average output tokens: 1,676 × 10,000 = 16,760,000
- Total input tokens: 4,630,000
- Total output tokens: 16,760,000
GPT-5 pricing (per OpenAI’s official pricing)
- Input tokens: $1.250 per 1M
- Output tokens: $10.000 per 1M
Cost calculation
- Input cost: 4.63M × $1.250 = $5.79
- Output cost: 16.76M × $10.000 = $167.60
- Total cost for 10,000 queries: $173.39
- Per thousand requests: $17.34
How that compares to a purpose-built SERP extraction API
A specialized SERP extraction API ships the structured-output guarantees as part of the product. No per-request system-prompt tax, no JSON-schema reinforcement on every call. Using cloro as the comparison point at 5 credits per structured search:
| Plan | GPT-5 via OpenAI API | cloro cost (5 credits × CPM) | Difference |
|---|---|---|---|
| Hobby (250K requests) | $4,335 | $500 (5 × $0.40) | $3,835 |
| Business (3.3M requests) | $57,222 | $4,950 (5 × $0.30) | $52,272 |
Per-thousand: $17.34 (GPT-5) vs $2.00 (cloro Hobby) vs $1.50 (cloro Business). The gap widens with volume because the system-prompt overhead is a fixed per-request tax on the LLM side and absent on the extraction-API side.
GPT-5-mini cost analysis
We ran the same test against GPT-5-mini for the cheaper model comparison.
GPT-5-mini pricing (per OpenAI’s official pricing)
- Input tokens: $0.250 per 1M
- Output tokens: $2.000 per 1M
Cost calculation
- Input cost: 4.63M × $0.250 = $1.16
- Output cost: 16.76M × $2.000 = $33.52
- Total cost for 10,000 queries: $34.68
- Per thousand requests: $3.47
GPT-5-mini vs purpose-built extraction API

| Plan | GPT-5-mini via OpenAI API | cloro cost (5 credits × CPM) | Difference |
|---|---|---|---|
| Hobby (250K requests) | $868 | $500 (5 × $0.40) | $368 |
| Business (3.3M requests) | $11,451 | $4,950 (5 × $0.30) | $6,501 |
Per-thousand: $3.47 (GPT-5-mini) vs $2.00 (cloro Hobby) vs $1.50 (cloro Business). The gap is narrower than against full GPT-5 but still meaningful at sustained volume.
Summary: models compared
| Model | Cost per 1K requests | vs purpose-built extraction API |
|---|---|---|
| GPT-5 | $17.34 | 88.5% more expensive (vs $2.00) |
| GPT-5-mini | $3.47 | 42.3% more expensive (vs $2.00) |
| Extraction API (cloro) | $2.00 | — |
Real-world impact at scale
Translating these per-request numbers into operational cost at the volumes a brand-monitoring program actually runs at:
Scenario: monitoring 1,000 brands with 100 daily queries each
- Daily queries: 100,000
- Monthly queries: 3,000,000
Monthly cost
- GPT-5 via OpenAI API: $52,020
- GPT-5-mini via OpenAI API: $10,410
- Purpose-built extraction API (cloro Business): $4,500
Annual difference vs the extraction API
- vs GPT-5: $570,240
- vs GPT-5-mini: $70,920
The break-even point at which the LLM-as-extractor pattern stops making economic sense lands earlier than most teams expect. Below 100K queries the cost difference is rounding error and the LLM’s flexibility wins. Past 1M/month and the system-prompt tax dominates the bill, at which point a purpose-built extractor is the rational call.
For deeper pricing comparisons across the broader extraction-API market, see Cheapest SERP APIs in 2026 and Best SERP APIs.
Methodology note: tests were conducted using real-world brand-monitoring queries. Costs are based on OpenAI’s published API pricing as of October 2025; cloro pricing reflects the public Hobby and Business plans. Your actual costs may vary based on specific query complexity, output verbosity, and negotiated enterprise pricing.
Frequently asked questions
Is GPT-5 more expensive than GPT-4 for structured extraction?+
Yes, significantly. For structured-data extraction tasks where you have to send a long JSON-schema system prompt with every request, raw GPT-5 via the OpenAI API runs roughly 11x more expensive per thousand requests than a specialized extraction API.
Why is structured output so costly with general-purpose LLMs?+
The system prompt that instructs the model to follow a strict JSON schema, return citations in a specific shape, and stay on-format adds hundreds of tokens to every single request. At scale that overhead dominates the bill — in our test the prompt alone was 382 tokens, sent 10,000 times.
What is the 'system prompt challenge'?+
To get reliable structured data from a general-purpose LLM, you need to send a long, detailed system prompt with every request explaining the schema, format rules, and citation requirements. Those tokens are billed on every call, so the system prompt becomes a fixed per-request tax that scales linearly with traffic.
What is the real-world impact of these cost differences?+
For a team monitoring 1,000 brands with 100 daily queries each, the gap between raw GPT-5 and a purpose-built extraction API is over half a million dollars a year. At sub-100K-query volumes the gap is small enough to ignore; at sustained 1M+/month it dominates the line item.
How were the token counts measured?+
Token counts (463 input, 1,676 output average per query) were measured directly in OpenAI's API playground against a representative sample of brand-monitoring and competitive-analysis queries. Cost figures use OpenAI's published pricing as of October 2025.
Related reading
Cheapest SERP APIs in 2026: True Cost-per-Call Compared
Find the cheapest SERP API in 2026 by true cost-per-call. We compare cloro, TrajectData, Serper, DataForSEO, and SerpApi — including the hidden fees that flip the rankings.
Python SERP Scraper: Call the cloro SERP API in 2026
Two ways to get Google results in Python: scrape directly (fragile) or call a SERP API (stable). Working code for both, plus AI engine tracking.
Best SERP APIs in 2026: 6 Tested for AI & Google Search
We tested 6 SERP APIs against AI Overviews, modern Google layouts, and Bing — see which handles AI search and which is stuck on the old SERP.