Research

Is Website Scraping Legal? 2026 Rules (US + EU)

Ricardo Batista

Founder, cloro

February 26, 2026•Updated July 8, 2026•12 min read•

ScrapingLegalCompliance

On this page

TL;DR

Yes — scraping publicly accessible data is legal across the US and EU, but legality hinges on how you collect it, not just that the data is public.
hiQ v. LinkedIn (2019/2022) and Meta v. Bright Data (Jan 2024) established that scraping public data without bypassing authentication isn’t a CFAA violation, and that terms of service bind logged-in use only.
The hard boundaries are PII (GDPR), copyrighted creative expression (extract facts, not articles), and authentication or paywalls (a criminal-access risk in every jurisdiction).
New in 2026: the EU AI Act requires general-purpose AI providers to disclose training-data sources and top domains, respect copyright opt-outs, and bans untargeted facial-image scraping.
Run every project through the compliance checklist — data type, access method, robots.txt, rate — and default to the stricter EU framework when scraping across jurisdictions.

Short answer: yes — scraping public web data is legal in the US and EU, provided you respect the CFAA, GDPR, a site’s robots.txt, and reasonable rate limits. Scraping publicly available data is generally permissible, much like reading a book in a public library. What matters is how you do it. The whole website scraping legal question turns on what data you collect, how you access it, and the jurisdiction you operate in.

This June 2026 update adds a seven-jurisdiction matrix (US, EU, UK, Canada, Australia, Japan, Brazil) and the four cases that anchor the framework: hiQ Labs v. LinkedIn, Meta v. Bright Data, Van Buren v. United States, and Clearview AI.

Is website scraping legal? Start with the core principles

Person pointing at a laptop displaying 'LEGAL SCRAPING' on a wooden desk with a plant.

The distinction is about risk, not just legality. Think of it as walking through an open front door versus picking the lock. One is an everyday action; the other bypasses security and invites legal trouble.

What determines scraping risk

The legal risk of a scraping project sits on a spectrum, and three factors decide where you land:

The type of data. Public business data like prices and SEO keywords, or personal data like names, emails, and photos?
The access method. Data openly visible to any visitor, or data behind a login, paywall, or CAPTCHA?
The impact on the site. A considerate visitor, or a scraper overwhelming the server with rapid-fire requests?

The most critical distinction in the “is website scraping legal” debate is whether the data is public or private. Courts have consistently ruled that accessing information available to any internet user without a password is not a crime.

Legal risk factors at a glance

A low-risk approach focuses on public, non-sensitive information and respects the site’s technical infrastructure. High-risk approaches involve personal data, bypassing barriers, and ignoring a site’s rules. The table below breaks down the factors that set your project’s risk level.

Risk Factor	Low Risk (Generally Permissible)	High Risk (Potential Legal Issues)
Data Type	Publicly available business data (e.g., SERP results, product prices)	Personal data (names, emails), copyrighted content, data behind a login
Access Method	Accessing open, public-facing pages without logging in	Bypassing CAPTCHAs, using stolen credentials, or circumventing IP blocks
Website Rules	Respecting `robots.txt` directives and rate limits	Ignoring `robots.txt`, aggressively hammering servers, causing downtime
Data Usage	Internal analysis, competitive intelligence, SEO monitoring	Republishing copyrighted material, creating a competing commercial product

With these principles in hand, you can assess your own projects and dig into the specific laws and cases that define the website scraping legal landscape.

How the CFAA shapes website scraping legality

In the United States, one law dominates the conversation: the Computer Fraud and Abuse Act (CFAA). Passed in 1986 to fight computer hacking, its dated language became the central battleground for scraping. The whole debate hangs on two phrases in the statute: accessing a computer “without authorization” or in a way that “exceeds authorized access”, per the text codified at 18 U.S.C. § 1030.

For years, companies argued that if their Terms of Service said “no scraping,” then any scraping was automatically unauthorized. That view would have made collecting public information a potential federal crime — like a public park putting up a “no photography” sign and calling every photo a trespass.

The landmark case: hiQ Labs v. LinkedIn

The tension came to a head in hiQ Labs v. LinkedIn. HiQ, a small analytics firm, scraped public LinkedIn profiles to give employers workforce insights. LinkedIn sent a cease-and-desist letter, claiming this broke the CFAA. HiQ argued the data was public and that LinkedIn was crushing a competitor.

The case reached the Ninth Circuit, which sided with hiQ. As the Electronic Frontier Foundation summarizes the ruling, using automated scripts to access publicly available data is not “hacking,” and neither is violating a site’s terms of use. If data is publicly accessible and no technical barrier stands in the way, accessing it isn’t “unauthorized” under the CFAA.

The Ninth Circuit’s ruling effectively stated that the CFAA does not let website owners unilaterally forbid the scraping of data that is otherwise accessible to the public.

The Supreme Court sharpened the point in Van Buren v. United States (2021). The Court held that a person “exceeds authorized access” only by obtaining information from areas of a computer that are off-limits to them — not by using authorized access for a disapproved purpose.

For scrapers, that forecloses the theory that a mere terms-of-service breach is a computer crime; the CFAA now requires an actual bypass of an access control. The Ninth Circuit reaffirmed hiQ in 2022 on that narrower reading.

What “unauthorized access” means today

After hiQ and Van Buren, the line for SEO and data teams is a technical gate, not a site’s fine print. In practice:

Public data is fair game. Scraping data visible to any anonymous user — search results, product prices, news articles — generally does not violate the CFAA.
Authentication is the barrier. The moment you log in or use credentials to reach data, scraping it moves into unauthorized-access territory.
Bypassing technical blocks is a no-go. Defeating CAPTCHAs, IP blocks, or bot detection can be read as gaining unauthorized access.

That distinction is what keeps most SERP and pricing scraping on solid legal ground. Pulling public results for competitive analysis is a world away from scraping a user’s private account behind a login. The CFAA is an anti-hacking law, not an anti-scraping one.

Meta v. Bright Data — the post-hiQ update

The most important scraping ruling since hiQ came down on January 23, 2024. Judge Edward M. Chen of the Northern District of California granted summary judgment to Bright Data in Meta Platforms v. Bright Data, per Farella Braun + Martel’s analysis.

The court held that Meta’s terms govern “your use” of its products — and that Bright Data did not “use” Facebook when it scraped public logged-off pages after terminating its accounts.

The broader logic, per Quinn Emanuel’s client alert: giving platforms free rein over who can collect public data risks creating information monopolies that disserve the public interest. Meta dropped the suit in February 2024 and waived its appeal, per TechCrunch’s coverage. The takeaway that shapes website scraping legal strategy today: terms of service do not bind a scraper that operates without logging in.

Landmark scraping cases at a glance

Year	Case	Jurisdiction	What it established
2016	Power Ventures v. Facebook	9th Cir. (US)	Cease-and-desist letter + continued scraping = unauthorized access. Pre-hiQ, the stricter view.
2017–2022	hiQ Labs v. LinkedIn	9th Cir. (US)	Scraping publicly accessible data is not “unauthorized access” under the CFAA.
2021	Van Buren v. United States	SCOTUS (US)	“Exceeds authorized access” means entering off-limits areas, not misusing authorized access.
2024	Meta v. Bright Data	N.D. Cal. (US)	Terms of service prohibit logged-in scraping only; logged-off public scraping after account termination is not a breach.
2024–2025	Clearview AI (EU/UK regulators)	EU, UK	Public photos are not free-use under GDPR; multiple regulators fined the company for facial-image scraping.

The trend is consistent: public data is legally scrapeable, contract terms bind logged-in use only, and the harder boundaries are PII (GDPR), creative expression (copyright), and explicit technical access controls (CFAA).

A desk with a laptop, smartphone, documents, and a sign saying 'PROTECT PRIVACY'.

The CFAA clarifies access to public data, but it’s only one piece of the puzzle. The real minefield is personally identifiable information (PII). Grabbing public prices is like taking notes in a public market. Scraping personal data is like hiding a camera to record everyone’s face — one is research, the other is a privacy breach with severe exposure.

The Clearview AI cautionary tale

No case illustrates the danger better than Clearview AI. The company built a facial-recognition database by scraping billions of photos from public social media profiles, then sold access to law enforcement and private firms. That shattered people’s expectation of privacy and triggered a global crackdown under GDPR.

Clearview’s defense — that the photos were “publicly available” and it had a “legitimate interest” basis — failed with regulators. That basis is GDPR Article 6(1)(f), which permits processing “necessary for the purposes of the legitimate interests pursued by the controller” unless overridden by the individual’s rights. For facial-recognition data, EU and UK authorities consistently rejected that balancing test. The teeth are real: GDPR allows fines up to €20 million or 4% of total global turnover, whichever is higher, and regulators across the EU and UK imposed multimillion-euro penalties on Clearview and ordered deletion.

Clearview AI cemented a core principle: just because data is publicly visible doesn’t mean it’s free for any and all use. Without consent or a defensible legal basis, collecting personal data is a high-stakes gamble.

Lessons for SEO and data teams

These are the website scraping legal considerations worth carving into your compliance strategy:

Avoid PII. Without explicit consent and a clear legal basis, don’t scrape names, emails, phone numbers, or photos. Even usernames are risky if they link to a real person.
“Public” doesn’t equal “permissible.” A photo on a public profile doesn’t grant a free pass to scrape, store, and build a commercial product with it.
Global privacy laws have long arms. GDPR (Europe), CCPA/CPRA (California), and LGPD (Brazil) apply based on whose data you scrape, not where your office sits.

The message is clear: stick to clean, compliant sources. Focus on non-personal signals like SERP features, product specs, or business listings. The reward from scraping PII rarely justifies the legal and financial risk.

The 2026 EU AI Act and training-data disclosure

The single biggest 2026 development isn’t a court case — it’s a regulation. The EU AI Act, with general-purpose AI (GPAI) provisions taking effect in 2026, introduces the first transparency regime for AI training data. Per Scalevise’s analysis, every GPAI model provider must now:

Publish a summary of training data sources, including the top 10% of domain names used (top 5% or top 1,000 for SMEs)
Respect copyright opt-outs under the EU Copyright Directive’s text and data mining exception (Article 4)
Describe crawler behavior — which bots, how they operated, when data was collected
Label AI-generated content at the output layer

Per WilmerHale’s coverage, the European Commission has released the mandatory disclosure template, so the obligation is concrete. Per IAPP’s analysis of the Digital Omnibus, the GDPR amendments meant to ease AI training have been described as missing the mark — so existing GDPR constraints on personal-data scraping remain in force.

The Act also bans specific behaviors outright. Per the Future of Privacy Forum’s analysis, Article 5 prohibits untargeted scraping of facial images from the internet or CCTV for recognition databases — the Clearview pattern, written into law.

For data teams, this is what keeps website scraping legal in Europe:

AI training pipelines that touch EU citizens are subject to GPAI transparency obligations, wherever the model is trained.
Respecting robots.txt, ai.txt, and TDM opt-out signals is the legal basis for the lawful-use exception, not a courtesy.
Documenting where your data came from — which domains, which crawler, which date range — is now required.

Per IAPP’s broader 2026 web-scraping analysis, DSA enforcement is accelerating the move away from the unregulated “wild west” era. Compliance posture and lawful-basis analysis are now operational requirements for scraping at scale in the EU.

Jurisdiction-by-jurisdiction: where the rules differ

Scraping law is fragmented globally. A project that’s clearly legal in the US can be a GDPR violation in the EU. The matrix below summarizes seven jurisdictions across four data classes — the most material commercial markets for scraping in 2026.

Use case	US (CFAA + state)	EU (GDPR + AI Act)	UK (DPA 2018 + CMA)	Canada (PIPEDA)	Australia (Privacy Act)	Japan (APPI)	Brazil (LGPD)
Public business data (prices, specs, SERP)	✅ Generally legal	✅ Legal if no PII	✅ Legal if no PII	✅ Legal if no PII	✅ Legal if no PII	✅ Legal if no PII	✅ Legal if no PII
Public PII (names, emails, photos)	⚠️ CCPA/CPRA apply	🔴 GDPR — needs lawful basis	🔴 DPA 2018 — needs lawful basis	⚠️ PIPEDA if commercial	⚠️ APPs if commercial	⚠️ APPI; opt-out for sensitive data	⚠️ LGPD — needs lawful basis
Behind login or paywall	🔴 CFAA unauthorized-access	🔴 GDPR + breach of contract	🔴 Computer Misuse Act 1990	🔴 Criminal Code s.342.1	🔴 Unauthorized access	🔴 Unauthorized Computer Access Law	🔴 Criminal Code Art. 154-A
Copyrighted creative expression	🔴 Copyright infringement	🔴 Copyright Directive + TDM	🔴 CDPA 1988	🔴 Copyright Act	🔴 Copyright Act 1968	🔴 Copyright Act (narrow TDM)	🔴 LDA + LGPD overlay

Headline takeaways:

The US has the most scraper-friendly framework for public business data, anchored by hiQ and Meta v. Bright Data.
The EU is the most restrictive, especially after 2026 AI Act enforcement. Its jurisdiction reaches anyone scraping EU citizens’ data, wherever the scraper is based.
The UK substantially mirrors EU rules post-Brexit through DPA 2018 and CMA enforcement.
Japan’s APPI treats most public business data permissively but requires opt-out handling for sensitive personal data even when it’s public.
Authentication bypass is criminal everywhere — the CFAA, the UK Computer Misuse Act 1990, Canada’s Criminal Code s.342.1, and Brazil’s Article 154-A all criminalize bypassing access controls.
Copyright applies universally — fact extraction is generally fine; reproducing creative work is not. The EU’s TDM exception is the main carveout, subject to a machine-readable opt-out.

If you scrape across jurisdictions, treat the EU framework as your operational floor. Treating EU-citizen data as opt-in, respecting TDM opt-out signals everywhere, and avoiding PII without a legal basis are the three behaviors that keep multi-jurisdiction scraping clean.

Terms of service and copyright

Hacking laws aren’t the only concern. Two civil areas — a site’s Terms of Service (ToS) and copyright law — can land you in court even where no CFAA issue exists.

Are terms of service legally binding?

Ignoring a site’s ToS isn’t a federal crime, but it can get you sued for breach of contract. Whether the site can win comes down to how the terms were presented:

Clickwrap agreements are the strongest. You tick a box or click “I Agree” to proceed, and that action forms a binding contract. Scraping after clicking “I Agree” to a ToS that forbids it is playing with fire.
Browsewrap agreements are far weaker. The ToS is just a footer link, and courts are often skeptical that using the site implies agreement.

For a browsewrap agreement to bind you, the site must show you had “actual or constructive knowledge” of the terms. If the link is buried and you never saw it, it’s tough to argue a contract formed.

That distinction matters. Many sites with public data use browsewrap terms, which makes a breach-of-contract suit hard to win — especially when a scraper hits the site anonymously and never touches the ToS link.

Copyright and scraped data

Copyright protects creative expression, not raw facts. Think of a cookbook: the ingredient list (2 cups flour, 1 cup sugar) is factual and free to use. The written instructions, the story, and the food photography are creative expression, and copying them word-for-word is infringement.

That fact-versus-expression split drives most website scraping legal issues in SEO work:

Scraping SERP data. When you scrape a Google results page, you’re mostly collecting facts — titles, URLs, and meta descriptions. Low risk.
Scraping a competitor’s blog. Copying every article and republishing it crosses a bright red line. That content is protected expression, and reproducing it is textbook infringement.

Your purpose matters too. Using factual data for internal analysis or a competitive-intelligence dashboard is different from republishing copyrighted content publicly. Rule of thumb: pull the raw facts, not the creative container they’re packaged in.

Ethical scraping: robots.txt and rate limits

Rear view of a man looking at a laptop showing a data process diagram with 'Ethical Scraping' text.

Staying legal is the starting line. Ethical scraping means being a good internet neighbor — and it’s practical, because aggressive scraping attracts legal threats and technical blocks even when the data is public.

Respecting robots.txt directives

Your first stop is the robots.txt file, a plain-text file in the site root that instructs crawlers. Think of it less as a legal wall and more like a “Please Keep Off the Grass” sign. Hopping the fence isn’t a crime, but ignoring the sign signals clear disrespect.

The robots.txt file is your guide to what the owner considers acceptable for bots. It’s not legally binding, but ignoring it is the fastest way to get your IP blocked and labeled a “bad bot.”

A typical file uses User-agent: * to target all bots, Disallow: /private/ to block a directory, and Allow: /public/ to permit one. Check and honor these directives as a simple sign of good faith.

Rate limiting

The second pillar is rate limiting: making requests at a reasonable, human-like pace. One person browsing a store is normal; a flash mob storming the entrance is a shutdown. Hammering a server with thousands of requests per second devours bandwidth and can crash the site, which causes real financial damage that the business will act to stop.

To scrape responsibly, build in delays:

Introduce random delays. Vary the timing instead of a fixed two-second gap, to mimic real browsing.
Scrape during off-peak hours. Run jobs late at night when there are fewer real visitors.
Identify yourself. Use a clear User-Agent string (e.g., “MyCoolSEOToolBot/1.0”) so owners can reach you if there’s a problem.

Respecting a site’s rules and technical limits is what keeps website scraping legal and sustainable for everyone.

A practical compliance checklist

Flowchart for web scraping risk assessment, determining low or high risk based on data type and robot rules.

Knowing the theory is one thing; practicing it is another. This isn’t legal advice — think of it as a pre-flight checklist that gut-checks your website scraping legal risk before a line of code gets written.

A responsible scraper acts more like a polite guest than a disruptive intruder. Respecting a site’s rules and technical limits minimizes conflict and keeps your access from being cut off.

Before kicking off any project, run through this checklist with your team. It turns abstract concepts into concrete action items. For rate-limiting patterns at scale, our guide to large-scale web scraping covers the best practices in depth.

Checklist Item	Assessment Question	Action/Mitigation
Authentication Gate	Does the data sit behind a login, paywall, or other access control?	If Yes, stop. This is a clear CFAA risk.
Personally Identifiable Information	Does the data include names, emails, phone numbers, or user photos?	If Yes, avoid scraping or consult a privacy expert. High GDPR/CCPA risk.
Copyrighted Content	Are you scraping creative works or factual data (prices, specs)?	Focus on facts. Republishing creative works is high copyright risk.
Terms of Service	Have you reviewed the ToS for explicit bans on scraping?	If Yes, weigh business need vs. breach-of-contract risk.
`robots.txt` Directives	Does `robots.txt` `Disallow` the target URLs?	Honor all `Disallow` rules. Ignoring them signals bad faith.
Scraping Rate	What is the planned request rate? Is it aggressive?	Implement rate limits and randomized delays.
Data Usage	Internal analysis, republication, or commercial product?	Internal analysis is lowest risk; republication is highest.

Making this checklist a mandatory first step ensures every project starts with a clear view of the hurdles. The real trouble in any scraping analysis comes from accessing private data or breaking clear technical rules — not from gathering public information.

Navigating data collection for AI and SEO requires a reliable partner. cloro provides a high-scale scraping API that delivers structured, compliant data from top search and AI assistants, eliminating legal guesswork and technical overhead. Get the clean, consistent data you need to power your workflows without the risk. Start with 500 free credits at cloro.dev.

About the author

Ricardo Batista

Founder, cloro

Ricardo is one of the founders and engineers behind its SERP and AI-search scraping infrastructure. Before cloro he scaled a financial comparison site to $7M ARR and ran the full-country operations of a unicorn to $65M ARR, then went back to building. He writes about search engine scraping, generative-engine optimization, and turning live search and AI-answer data into something teams can act on.

More articles by Ricardo Connect on LinkedIn

Frequently asked questions

Is web scraping legal in 2026?+

Yes — scraping publicly accessible web data is legal in the US and EU when done properly. The Ninth Circuit's 2019 hiQ Labs v. LinkedIn ruling and the January 2024 Meta v. Bright Data summary judgment both established that scraping public data without bypassing technical access controls is not a CFAA violation. The legal lines run through four boundaries: (1) authentication (do not bypass logins), (2) personal data (GDPR and CCPA apply when you scrape PII), (3) copyright (extract facts, not creative expression), and (4) rate limiting (do not cause server harm). Stay within those and you are on solid legal ground.

Is scraping a competitor's prices illegal?+

Generally, no. Scraping publicly available prices is a common, low-risk part of competitive intelligence. Prices are facts, not creative works protected by copyright. As long as prices are visible to any visitor without logging in and you are not hammering their servers, you are on solid ground. The risk creeps in if you have to click 'I Agree' on a Terms of Service that bans scraping before you can see the prices.

What did Meta v. Bright Data decide about web scraping?+

On January 23, 2024, Judge Edward M. Chen of the Northern District of California granted Bright Data's motion for summary judgment in Meta v. Bright Data. The ruling held that Meta's Facebook and Instagram terms of service only prohibit logged-in scraping, not logged-off scraping of publicly accessible content. The court applied traditional contract interpretation to the phrase 'your use' and concluded that Bright Data did not 'use' Facebook when it engaged in public logged-off scraping. The ruling reinforced the hiQ v. LinkedIn precedent and Meta dropped the lawsuit in February 2024.

Does the EU AI Act affect web scraping in 2026?+

Yes — starting in 2026, the EU AI Act requires every general-purpose AI model provider to publish a summary of training data sources, including the top 10% of domain names used (or top 5% or 1,000 domains for SMEs), respect copyright opt-outs under the EU Copyright Directive's text and data mining exception (Article 4), and describe how their web crawlers operated. The AI Act also explicitly bans untargeted scraping of facial images for facial recognition databases. Scraping for AI training is no longer a gray area in Europe.

Is GDPR compliance required when scraping personal data?+

Yes — if you scrape data belonging to EU citizens, GDPR applies regardless of where your servers or company are located. The most common lawful basis for AI training data scraping is 'legitimate interest' under GDPR Article 6(1)(f), but data protection authorities have adopted increasingly restrictive positions. The Clearview AI case set the precedent: scraping public photos for facial recognition drew multimillion-euro fines from regulators across the EU and UK. Avoid scraping PII without explicit consent and a clear legal basis.

What if a website's robots.txt says 'Disallow'?+

The robots.txt file is not a legally binding contract. Ignoring it will not result in CFAA hacking charges. However, ignoring it signals bad faith, looks bad if a dispute escalates, and is the fastest way to get your IP blocked. Respect robots.txt as a matter of professional conduct and good engineering hygiene. Cloudflare's 2024 announcement allowing one-click AI-bot blocking has made robots.txt enforcement increasingly common at the CDN layer.

Can I get sued for breaching terms of service?+

Yes, you can be sued for breach of contract, but whether the website owner can win depends on how you agreed to the terms. Clickwrap agreements (where you checked a box or clicked 'I Agree' to terms) are strongly enforceable; courts have consistently upheld them. Browsewrap agreements (where the terms are just a link in the footer) are far weaker and rarely enforced unless the site can prove you had actual or constructive knowledge of them. Meta v. Bright Data clarified that even logged-in clickwrap terms do not bind users to logged-off scraping after account termination.

Is scraping AI-generated content like ChatGPT or AI Overviews legal?+

Scraping AI-generated content sits in newer legal territory but follows the same principles. Publicly visible AI responses (rendered SERPs with AI Overviews, public ChatGPT shares) are generally treated like other public web data — accessing them does not violate the CFAA when no authentication is bypassed. Each platform's Terms of Service adds a contract layer, and OpenAI's terms specifically prohibit certain scraping behaviors. The conservative path: scrape your own observed sessions for monitoring (which OpenAI's terms allow), avoid commercial republishing of generated content, and use managed scraping services that operate within platform terms.

Comparisons

Best Web Scraping Tools 2026: 10 Tested Options

Compare the best web scraping tools for 2026: Scrapy, Playwright, Firecrawl, Crawl4AI, no-code tools, managed APIs, and AI scrapers for teams.

ChatGPT web search and grounding frequency study results across 52 countries

Research

ChatGPT Grounding Frequency Study: Web Search Runs in 80%+ of Prompts

Independent testing reveals ChatGPT uses web search and grounding in 81% of responses—much more frequently than the commonly believed 20-40%. We tested 5,200 queries across 52 countries to measure organic behavior.

Comparisons

Best ChatGPT Scraper 2026: 8 Tools Tested (Web UI)

We tested 8 ChatGPT scrapers against Cloudflare TLS fingerprinting, SSE streaming, and weekly-changing class names to find the best ChatGPT scraper for 2026.

See what AI says about your brand Read the docs

Is Website Scraping Legal? 2026 Rules (US + EU)