Can AI solve CAPTCHAs?

Yes, modern vision models (like GPT-4V or specialized solvers) can solve image CAPTCHAs with high accuracy.

What is the hardest CAPTCHA to solve?

Behavioral CAPTCHAs (like reCAPTCHA v3 or Cloudflare Turnstile) are hardest because they analyze browsing history and TLS fingerprints, not just a puzzle.

Is it illegal to bypass CAPTCHAs?

It depends on jurisdiction and intent. Bypassing access controls to access public data is often a legal grey area; bypassing them to commit fraud is illegal.

What is the cost of scraping with CAPTCHAs?

Solving CAPTCHAs adds significant financial and latency costs. API services charge per solution, and human-solved CAPTCHAs introduce delays of 15-45 seconds per challenge.

Are there automated alternatives to manual CAPTCHA solving?

Yes, dedicated scraping platforms like cloro integrate CAPTCHA solving directly into their request pipelines, handling detection, solving, and token injection automatically, saving engineering time and cost.

How to solve CAPTCHAs: the scraper's guide

The internet does not want you to read it.

You write a perfect Python script. It works for 5 minutes. Then, you see it: the “I am not a robot” checkbox. Or worse, a grid of traffic lights.

CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) are the gatekeepers of the web. For data scientists and developers, they are the primary bottleneck to gathering intelligence at scale.

If you are building a scraper, you have two choices: give up, or learn to solve them.

This guide is not about clicking traffic lights manually. It is about automating the solution so your scripts can run 24/7 without you.

Often, CAPTCHAs appear alongside other restrictions. You might also need to learn how to unblock websites restricted by geofencing or firewalls.

The major families of CAPTCHAs
Strategy 1: API solving services
Code example: solving reCAPTCHA v2 with Python
Strategy 2: Browser automation plugins
Strategy 3: AI vision models
The cost of scraping
The automated alternative

The major families of CAPTCHAs

Before you can solve a CAPTCHA, you need to identify it. Different vendors require different bypass strategies.

1. reCAPTCHA (Google)

The most common.

v2: The classic “I’m not a robot” checkbox. Sometimes triggers an image challenge.
v3: Invisible. It scores your behavior (mouse movements, browser history) from 0.0 to 1.0. If you score low, you are blocked.

2. hCaptcha

The privacy-focused alternative. Common on sites that want to avoid Google. Known for slightly harder image challenges (e.g., “Select the seaplane”).

3. Cloudflare Turnstile

The “smart” CAPTCHA. It often doesn’t show a puzzle at all. It inspects your browser environment (TLS fingerprint, canvas, fonts) to verify you are a real browser.

4. Geetest

Popular in Asia. Often involves sliding a puzzle piece or clicking characters in order.

Strategy 1: API solving services

This is the most reliable method for high-volume scraping.

You send the CAPTCHA data (like the sitekey and URL) to a third-party service. They route it to a human worker or a specialized AI model. The worker solves it, and the service sends you back a “token.”

You inject this token into the website’s form, and the server thinks you solved it.

Top services:

2Captcha: The veteran. Reliable, huge pool of human workers. Slower but solves almost anything.
CapSolver: AI-focused. Extremely fast and cheaper than humans. Great for reCAPTCHA and hCaptcha.
Anti-Captcha: Another solid human-based service with good API libraries.

Code example: solving reCAPTCHA v2 with Python

Here is how you actually implement this in Python using the 2captcha-python library (or raw requests).

Scenario: A website has a reCAPTCHA v2 lock on its login form.

Step 1: Find the Sitekey Inspect the HTML source of the target page. Look for data-sitekey="6Ld..." inside the CAPTCHA div or iframe.

Step 2: The Python script

import time
import requests

# Configuration
API_KEY = 'YOUR_2CAPTCHA_API_KEY'
SITE_KEY = '6Ld_TARGET_SITE_KEY'
URL = 'https://target-website.com/login'

def solve_recaptcha():
    print("Sending CAPTCHA to 2Captcha...")

    # 1. Send the request to the solving service
    response = requests.post('http://2captcha.com/in.php', data={
        'key': API_KEY,
        'method': 'userrecaptcha',
        'googlekey': SITE_KEY,
        'pageurl': URL,
        'json': 1
    })

    request_id = response.json().get('request')
    print(f"Task ID: {request_id}")

    # 2. Wait for the solution
    print("Waiting for solution...")
    while True:
        time.sleep(5)
        result = requests.get(f'http://2captcha.com/res.php?key={API_KEY}&action=get&id={request_id}&json=1')
        result_json = result.json()

        if result_json.get('status') == 1:
            print("CAPTCHA Solved!")
            return result_json.get('request')  # This is the token

        if result_json.get('request') == 'CAPCHA_NOT_READY':
            continue
        else:
            print(f"Error: {result_json.get('request')}")
            return None

# 3. Use the token
token = solve_recaptcha()
if token:
    # Now you submit this token with your form data
    # usually in a field named 'g-recaptcha-response'
    login_data = {
        'username': 'myuser',
        'password': 'mypassword',
        'g-recaptcha-response': token
    }
    # requests.post(URL, data=login_data)

Strategy 2: Browser automation plugins

If you are using Puppeteer, Playwright, or Selenium, you are controlling a real browser.

Instead of making API calls, you can install extensions that solve CAPTCHAs automatically inside the browser session.

Tools:

Puppeteer-extra-plugin-recaptcha: A famous plugin for Puppeteer. It uses AI to solve the image challenges automatically.
Buster: A browser extension that solves reCAPTCHA audio challenges using speech-to-text APIs.

Pros: Easier to integrate if you are already using a browser.

Cons: Slower than direct requests. Detecting the “I am not a robot” iframe can be flaky.

Strategy 3: AI vision models

For simple image CAPTCHAs (text on a distorted background), you don’t need a service. You can use Optical Character Recognition (OCR).

Libraries:

Tesseract (via pytesseract): Good for clean text.
EasyOCR: Deep learning-based, handles distortion better.
YOLO: For object detection CAPTCHAs (e.g., “Click all the buses”).

This approach is virtually free but requires significant development time to train or tune models for specific CAPTCHA types.

The cost of scraping

Solving CAPTCHAs is not free.

Financial Cost: API services charge per 1,000 solutions. (e.g., $0.50 to $3.00 per 1k). If you scrape 1 million pages, that’s $500-$3000 just in CAPTCHA fees.
Latency Cost: A human worker takes 15-45 seconds to solve a reCAPTCHA. This kills high-frequency trading or real-time monitoring scripts.
Maintenance Cost: Websites change their CAPTCHA providers. Today it’s reCAPTCHA; tomorrow it’s Cloudflare. Your script breaks, and you spend hours rewriting the solver logic.

The automated alternative

If your goal is the data, not the engineering challenge of breaking bot protection, building your own solver is often a waste of resources.

Advanced scraping platforms handle this natively.

cloro integrates CAPTCHA solving directly into the request pipeline.

When you send a request through Cloro to scrape Google Search or monitor ChatGPT, we detect the CAPTCHA, solve it (using a blend of AI and premium proxies), and return the clean HTML.

You don’t manage API keys. You don’t handle retries. You don’t wait 45 seconds for a human to click traffic lights.

Stop fighting the gatekeepers. Walk right past them.

Table of contents