← All posts

JA4 and Post-Quantum TLS Broke the Basic Scraper

Your User-Agent header doesn't matter anymore. JA4 fingerprints classify bots at 98.6% accuracy before headers are even read. Here's what shifted in 2026.

The TLS Handshake Is the Bot Detection Floor

98.6%.

That's the classification accuracy a CatBoost model hit using only JA4 features. No headers. No IPs. No behavior. Just the shape of the TLS handshake. The arXiv paper landed in February 2026, and the result isn't an outlier. Cloudflare, AWS, VirusTotal, and Akamai all run JA4 (or its earlier cousin JA3) in production. If you're scraping in 2026 with a plain HTTP client, the verdict was rendered before your request reached the application layer.

This is the part of bot detection tutorials skip. Most posts on anti-bot evasion still revolve around User-Agent rotation, cookies, and CAPTCHAs. Those are the easy layers. But the TLS layer is the one you can't bluff with a header.

What JA4 actually sees

JA4 is a fingerprint of the TLS ClientHello. It encodes the protocol (TCP or QUIC), TLS version, SNI presence, the ordered cipher suites, extensions, signature algorithms, and ALPN. The output is a compact string like t13d1516h2_8daaf6152771_e5627906d626. Two clients claiming to be the same browser will produce the same JA4 hash. A Python requests script claiming to be Chrome produces a JA4 that doesn't exist anywhere in the world except in scrapers.

The JA4 family (developed by FoxIO, the same group behind JA3) addressed JA3's biggest weakness: extension permutation, which Chromium introduced in 2023 to break naive fingerprinting. JA4 sorts extensions and counts them, so randomization doesn't help. There's no easy escape hatch.

Akamai disclosed 92-98% bot classification accuracy through cross-layer analysis. The cross-layer part matters. TLS alone is the dominant signal, but combining it with HTTP/2 frame ordering, header order, and request timing pushes the false-positive rate well below what most scrapers can tolerate.

The post-quantum twist

This is the part nobody saw coming. On January 31, 2026, Akamai made post-quantum key exchange the default for all connections. By early 2026, 57.4% of real browser-initiated connections include the X25519MLKEM768 key share. Chrome's PQ-capable share sits around 93%. Firefox 132 is at 85%. Safari is rolling out.

The PQ key share is large. 1,124 bytes versus 36 bytes for classical X25519. The ClientHello grew from 300-500 bytes to over 1,400. That growth shows up in JA4, in packet capture, and in passive observation at the WAF.

If your scraping client doesn't include the PQ key share, you're making a claim no current Chrome or Firefox would make. Two CVEs from the first quarter of 2026 flag exactly this mismatch: CVE-2026-26995 (padding extension) carries 25-50% detection probability per request, and CVE-2026-27017 (ECH and GREASE mismatch) lands around 50%. Combined across a session, exposure climbs toward near-certainty.

This is a 12-month problem turning into a 3-month problem. Most open-source scraping stacks haven't shipped PQ-compatible TLS yet. The ones that have are weeks behind real Chromium.

Why proxies don't fix this

There's a comforting story going around that bigger proxy pools solve modern bot detection. They don't. The January 2026 scalping incident covered by Security Boulevard used 16 million requests across 3.9 million unique IPs. Per-IP blocking was useless. The defense that worked was, mostly, TLS and behavioral fingerprinting.

The economics of residential proxies also broke this quarter. Help Net Security reported in April 2026 that the disruption of the IPIDEA network in January reduced industry residential capacity by roughly 40% overnight. The Bright Data and Oxylabs patent fight (the Supreme Court rejected Bright Data's petition on February 23, 2026, with trial set for May 18) is a sideshow next to that capacity hit. Buyers chasing residential IPs as a defense against fingerprinting are paying more for an answer the WAF doesn't care about.

Proxies still matter, just not for the reason most people think. Geographic distribution and ISP type shape routing decisions and rate-limit profiles. They don't help you survive the handshake.

What this means for data teams

Three things change if you're building or buying scraping infrastructure in 2026.

First, the TLS stack is now a hard requirement. Any client that doesn't impersonate a real browser's TLS handshake (PQ key share, extension ordering, ALPN, signature algorithms) produces a fingerprint that classifies as bot with high confidence. Wrapping Python requests in better headers solves nothing. The transport is the tell.

Second, headless browser detection got worse, not better. Browserless's State of Web Scraping 2026 reports the gap between headless and headed Chromium is widening. Anti-bot vendors have catalogued the fingerprint differences and share threat intel across customer sites in near-real-time. A headless instance that worked in December may classify as bot in May. The behavioral signals stack on top of TLS, and both are moving targets.

Third, the build-vs-buy math shifted. Maintaining a TLS fingerprint that matches a moving target (Chromium ships PQ updates every few weeks, extension order changes between minor versions, cipher suite preferences shift) is now a full-time job. Teams that spent 20% of an engineer on scraper maintenance in 2024 are spending more than half a headcount in 2026. We've written before about why scrapers keep breaking. In 2026, the answer is more often "TLS" than "DOM".

The cheapest scraper is the one that doesn't get classified

The interesting prediction isn't whether anti-bot vendors keep raising the bar. They will. The interesting prediction is which scraping tools survive a market where 98% accuracy is the table-stakes detection floor.

Most won't. But the ones that do will treat the TLS handshake as part of the request, not a transport detail. And buyers will start asking vendors a question that wasn't on the evaluation checklist twelve months ago: what TLS fingerprint do you ship, and how fast do you update it?

The handshake settles it before the request gets a chance to make its case.