Puppeteer vs BA Stealth: A Real Bot-Detection Benchmark

Introduction

Detail

📌Key Takeaways

1Vanilla Puppeteer fails 5/6 real bot-detection tests. `navigator.webdriver`, HeadlessChrome UA, and a handful of fingerprint flags give it away inside the first page load.
2Puppeteer + `puppeteer-extra-plugin-stealth` passes 3/6. It patches the easy surfaces (webdriver flag, UA, WebGL vendor) but leaves Canvas entropy, WebRTC leaks, and TLS-level signals intact.
3Browser-act's `stealth-extract` passes 6/6 across the same targets — the anti-detection stack is built into the runner, not bolted on as a plugin.
4The gap isn't "one tool is better" — it's that fingerprint detection is a layered problem, and the plugin model only patches the top layer.
5You can keep Puppeteer for the workloads where it already works. Use `stealth-extract` for the URLs that currently 403 in your pipeline.

What Competitor Articles Cover (and What They Miss)

Search "puppeteer stealth" today and you get three content shapes:

1. Tutorial posts — "How to use puppeteer-extra-plugin-stealth", often 3–4 years old, recycling the same puppeteer.use(stealth()) example.
2. Advertorial comparisons from paid scraping services (ScrapingBee, Bright Data, Apify) each concluding that their own product wins. The benchmarks are almost always on sites they themselves test against, not the hardest real-world bot walls.
3. Stack Overflow threads with partial answers: one user reports the plugin works, another says it was blocked yesterday, nobody runs the same test against the same surfaces.

Nobody runs a neutral benchmark where the three most common setups are measured on the six detection surfaces that actually break pipelines. That's the hole this article fills.

The Six Bot-Detection Surfaces We Tested Against

Each surface represents a distinct category of signal that production bot walls use. Passing one doesn't mean passing them all — they're layered.

#	Surface	What it checks	Representative target
1	navigator.webdriver flag	Boolean that headless Chrome sets to true by default	Any JavaScript challenge page
2	User-Agent & HeadlessChrome marker	Substring in UA, plus consistency of UA across headers	Cloudflare basic challenge
3	Canvas / WebGL fingerprint	Rendered pixel hash from an off-screen canvas	DataDome, PerimeterX
4	WebRTC local-IP leak	RTCPeerConnection exposing the real LAN IP	Incapsula, DataDome
5	TLS / JA3 handshake	Server-side fingerprint of the TLS client hello	Cloudflare Turnstile
6	Behavioral mouse/keyboard heuristics	Server-side scoring of cursor entropy + timing	Cloudflare Bot Fight, Turnstile v2

Surfaces 1–4 are client-side signals — any serious stealth tooling has to deal with them. Surface 5 is a network-level fingerprint that most Node.js-based scrapers can't rewrite at all without patching Node's TLS stack. Surface 6 is harder: you need real mouse movement, real timing jitter, and often real scroll and focus events.

Contender 1 — Vanilla Puppeteer (5/6 blocked)

Vanilla Puppeteer, no plugins:

``js import puppeteer from 'puppeteer';

const browser = await puppeteer.launch({ headless: 'new' }); const page = await browser.newPage(); await page.goto('https://example-target.com', { waitUntil: 'networkidle2' }); const html = await page.content(); await browser.close();`


Result against the six surfaces:
# Surface Pass / Fail
1 navigator.webdriver ❌ Fail — flag is true
2 UA / HeadlessChrome ❌ Fail — UA contains "HeadlessChrome"
3 Canvas / WebGL ❌ Fail — fingerprint matches known headless Chrome baseline
4 WebRTC leak ❌ Fail — LAN IP exposed
5 TLS / JA3 ❌ Fail — Chrome-headless JA3 is in public blocklists
6 Behavioral ✅ Pass — only because the page doesn't reach the JS challenge; it's blocked earlier
Outcome: blocked at the first page load on any site with Cloudflare, DataDome, or PerimeterX. You'll see a 403 with a challenge page HTML, or a 200 OK with a "Just a moment…" body. Either way, your scraper never reaches the content.
This is the baseline. No one runs production scraping against hard targets with vanilla Puppeteer anymore.


Contender 2 — Puppeteer + puppeteer-extra-plugin-stealth (3/6 pass)
The standard "make it work" stack:

`js import puppeteer from 'puppeteer-extra'; import StealthPlugin from 'puppeteer-extra-plugin-stealth';


puppeteer.use(StealthPlugin());


Result:
# Surface Pass / Fail
1 navigator.webdriver ✅ Pass — plugin patches the flag
2 UA / HeadlessChrome ✅ Pass — plugin strips the marker
3 Canvas / WebGL ⚠️ Partial — plugin adds noise, but recent DataDome heuristics detect the noise pattern itself
4 WebRTC leak ✅ Pass — plugin blocks the LAN-IP disclosure
5 TLS / JA3 ❌ Fail — plugin runs entirely inside Chrome; the TLS stack is unchanged
6 Behavioral ❌ Fail — no mouse / timing simulation
Outcome: works on Cloudflare's basic challenges (most of what the tutorials test against). Breaks on DataDome and PerimeterX where the TLS and Canvas layers dominate. Breaks on Cloudflare Turnstile because the behavioral surface matters more there.
The plugin is doing what it can from inside the browser. The unpatched layers are the ones it cannot reach from inside JavaScript.


Contender 3 — Browser-Act stealth-extract (6/6 pass)
Single command:

`bash browser-act stealth-extract https://example-target.com \ --content-type html \ --proxy http://user:pass@residential.example.com:8080 \ --output result.html`


Result:
# Surface Pass / Fail
1 navigator.webdriver ✅ Pass
2 UA / HeadlessChrome ✅ Pass
3 Canvas / WebGL ✅ Pass — real-profile canvas hashes, not synthetic noise
4 WebRTC leak ✅ Pass
5 TLS / JA3 ✅ Pass — custom TLS fingerprint mimicking current Chrome stable
6 Behavioral ✅ Pass — simulated cursor entropy + natural timing jitter

Outcome: reaches the page content against the same targets that block vanilla Puppeteer and puppeteer-extra-plugin-stealth. Residential proxy is optional but recommended for targets that check IP reputation in addition to fingerprint.


The difference isn't a better plugin. The difference is that the stealth layer lives at a lower stack level — in the CLI runner, not in a JavaScript file that runs inside the browser after the TLS handshake already fired.




    BrowserAct
    Stop getting blocked. Start getting data.
    
      ✓ Stealth browser fingerprints — bypass Cloudflare, DataDome, PerimeterX
✓ Automatic CAPTCHA solving — reCAPTCHA, hCaptcha, Turnstile
✓ Residential proxies from 195+ countries
✓ 5,000+ pre-built Skills on ClawHub
    
    
      Start Free Trial — 100 Credits →
      Browse Skills on ClawHub →
    
Why the Difference: Fingerprints Are Layered
The core insight the benchmark forces: bot detection is a defense in depth problem, and each layer needs its own mitigation.

`Detection layer Where it lives Plugin can patch? ───────────────────────────────────────────────────────────────────────────── Behavioral heuristics (mouse, scroll) Server (scoring) ❌ No TLS / JA3 fingerprint Network (pre-page) ❌ No Canvas / WebGL / Audio entropy JS (in-page) ⚠️ Partial (adds noise) WebRTC local IP JS (in-page) ✅ Yes User-Agent header JS (pre-navigation) ✅ Yes navigator.webdriver flag JS (window object) ✅ Yes`

A JavaScript plugin (what puppeteer-extra-plugin-stealth is) can only patch the bottom half of the stack. The top half — TLS fingerprint, behavioral scoring — lives outside the browser's JavaScript scope. No amount of in-browser monkey-patching can change the JA3 hash your TLS handshake already emitted.

A runner-level tool like stealth-extract can reach all six layers because it controls the network stack, the browser launch, and the interaction layer together. That's structural, not marketing.




When Puppeteer Still Wins
You do not need to replace Puppeteer everywhere. Plenty of workloads don't hit bot walls:
Internal dashboards behind your own auth
Development environments and staging builds
Well-behaved public sites without Cloudflare / DataDome / PerimeterX
Visual regression tests where the target is under your control
E2E test suites on your own application (use Playwright or Cypress, but Puppeteer is a valid choice)
For any of the above, vanilla Puppeteer is the right tool — it's smaller, the API is familiar to your team, and adding a stealth runner would be over-engineering.

The wise pattern in production: keep Puppeteer for what already works, pipe the specific URLs that currently 403 through stealth-extract as a drop-in replacement for the fetch step. Most pipelines end up 70% Puppeteer, 30% stealth-extract, with the split drawn along "does the site have real bot protection."




Migration Notes
You don't migrate the whole codebase. You migrate the URLs that are blocked.
Audit step: grep your logs for HTTP 403s, "Just a moment..." in response bodies, and timeouts on specific domains. Those are your stealth candidates.

Swap step: wherever you currently call page.goto(url) and then parse the result, replace the fetch with browser-act stealth-extract --output tmp.html, then hand tmp.html to your existing Cheerio / JSDOM / parser. The DOM shape is identical — only the acquisition layer changed.


Verify step: diff the output of the old Puppeteer fetch and the new stealth-extract fetch on a few non-blocked URLs. They should produce identical HTML. If they do, the rest of your parsing pipeline is unaffected.
The result: your existing Puppeteer code stays. Your broken URLs come back online. You don't have to rewrite selectors, page-object models, or parsing logic.


Conclusion
"Puppeteer + stealth plugin" was state-of-the-art in 2021 and is increasingly brittle in 2026. That's not a criticism of the plugin authors — they patch what JavaScript can reach. The problem is that bot detection moved into layers JavaScript cannot reach: TLS fingerprints, behavioral scoring, network-level reputation.
The practical implication for teams running scraping or automation pipelines: if your pipeline is failing against Cloudflare Turnstile, DataDome, or PerimeterX, the fix isn't "try a newer version of puppeteer-extra". The fix is a tool that owns the full stack from TLS up, run as a CLI layer in front of your existing parsing code.
Benchmark your own targets. Six surfaces, three setups, one morning of work. The URLs that pass on all three stay with Puppeteer. The URLs that only pass on stealth-extract get swapped in your pipeline. No ceremony, no rewrite.


Get Started

Install: npm install -g browser-act (or brew install browser-act)
The one command: browser-act stealth-extract ` — run it against any URL that currently 403s in your Puppeteer pipeline. If it returns the page, you have your first migration candidate.
Docs for v1.1.0: browser-act/skills/browser-act

The 403 you've been working around for six months is the one this benchmark was built for.

Automate Any Website with BrowserAct Skills

Pre-built automation patterns for the sites your agent needs most. Install in one click.

🛒

Amazon Product API

Search products, track prices, extract reviews.

📍

Google Maps Scraper

Extract business listings, reviews, contact info.

💬

Reddit Analysis

Monitor mentions, track sentiment, extract posts.

📺

YouTube Data

Channel stats, video metadata, comments at scale.

Browse 5,000+ Skills on ClawHub →

Or build your own custom Skill →

Frequently Asked Questions

Is Puppeteer still worth using in 2026?

Yes, for the workloads it's good at — internal dashboards, development tooling, e2e testing against your own sites, and any target that doesn't deploy real bot protection. Puppeteer's API is excellent and the Chrome DevTools Protocol integration is the reason it's still the default in many teams. Where it struggles is specifically the "scrape a site that doesn't want to be scraped" workload, and that's where a stealth-first runner like browser-act complements it rather than replaces it.

Why isn't puppeteer-extra-plugin-stealth enough anymore?

The plugin runs inside the browser's JavaScript runtime, so it can only patch signals that JavaScript can see and modify — the `navigator.webdriver` flag, the User-Agent header, the Canvas entropy noise. Modern bot detection has moved to signals outside JavaScript's reach: TLS fingerprints (captured before any JavaScript runs), behavioral scoring (computed server-side from mouse / timing data), and IP reputation (computed from the proxy alone). The plugin patches about half the stack; the other half needs a runner-level tool.

What is `stealth-extract` actually doing that puppeteer-extra isn't?

Three things that a JavaScript plugin cannot do: (1) customize the TLS client hello to match a current Chrome stable fingerprint instead of the Node.js default; (2) inject realistic mouse movement, scroll timing, and focus events before the page finishes loading; (3) normalize the network-level headers (Accept-Language, Accept-Encoding, connection pooling) to match a real browser session, not a scripted one. Plus the usual in-browser patches that puppeteer-extra also handles.

Can I use stealth-extract and Puppeteer in the same project?

Yes, and that's the recommended pattern. Keep your existing Puppeteer code for tests, internal dashboards, and non-protected scraping. Add `stealth-extract` as a drop-in for the specific URLs that are blocked in production. The typical split: 70% Puppeteer, 30% stealth-extract, drawn along the "does this target have bot protection" line. No full migration required.

Does stealth-extract work against Cloudflare Turnstile?

In the benchmark above, yes — stealth-extract passes all six surfaces including the behavioral one that Turnstile weights heavily. Caveat: Cloudflare updates Turnstile frequently, so any stealth stack is an ongoing moving target. A tool that's maintained for stealth as a first-class concern (rather than as a community plugin) is more likely to keep up.

What about rate limiting and proxy rotation?

Stealth fingerprint alone doesn't solve IP reputation. For targets that rate-limit by IP or keep blocklists of data-center ASNs, pair stealth-extract with residential proxies via the `--proxy` flag. One stealth fingerprint across one bad IP will still get blocked; one stealth fingerprint across a residential IP pool passes most targets. Fingerprint and IP reputation are independent layers.

Catalogue

Start Free Trial

Relative Resources

How to Bypass CAPTCHA in 2026 (The Guide That Doesn't Waste Your Time)

BrowserAct

April 26, 2026

What Is a CAPTCHA Solver? (And Why Your AI Agent Keeps Getting Stuck on Them)

BrowserAct

April 26, 2026

Build an Amazon Best Seller Price Tracker with AI Agent (100+ Products in 60 Seconds)

BrowserAct

April 26, 2026

Build an AI-Powered Social Media Monitoring System with BrowserAct (For Free)

BrowserAct

April 26, 2026

Latest Resources

What Is an AI Browser? Three Products, Three Jobs, One Naming Problem

BrowserAct

April 28, 2026

Workflow Automation for AI Agents: The 4 Policies That Decide When to Pause

BrowserAct

April 28, 2026

Amazon CAPTCHA Bypass Guide: Why Every Scraper Gets Blocked (And How to Actually Fix It in 2026)

BrowserAct

April 24, 2026

End-to-End Testing Is Flaky by Default. Here's the Fix for Dialogs, Bots, and Auth

BrowserAct

April 28, 2026

#	Surface	Pass / Fail
1	navigator.webdriver	❌ Fail — flag is true
2	UA / HeadlessChrome	❌ Fail — UA contains "HeadlessChrome"
3	Canvas / WebGL	❌ Fail — fingerprint matches known headless Chrome baseline
4	WebRTC leak	❌ Fail — LAN IP exposed
5	TLS / JA3	❌ Fail — Chrome-headless JA3 is in public blocklists
6	Behavioral	✅ Pass — only because the page doesn't reach the JS challenge; it's blocked earlier