Jan Schaumann in Elk: "Yes, bots being >50% of all traffic i..."

Yes, bots being >50% of all traffic is bonkers, but at the same time I'm increasingly becoming convinced that bot defenses are (largely? equally? also?) harmful to the overall ecosystem.

Everything bot detection relies on is basically ossification: TLS fingerprinting, protocol offering and preference, HTTP header presence and ordering, ...

Assuming / enforcing those is overall bad for the web, and all browsers and clients should really grease all of that.

August 19, 2025 at 2:53:17 PM

and the ones with the biggest muscles circumvent all of those filters easily anyway while it blocks the low volume little guys that nobody actually wanted to block

Yeah. I mean, there's _some_ value in trivial defenses to somewhat tone down the noise (akin to e.g., run sshd on a different port than 22, just to avoid the very noisy trivial scanners), but it's never going to be effective against dedicated actors.

in real question: what's the alternative if your service is down because of resource exhaustion because of bot scrape spam?

It's not >50%, it's >98% and still getting worse.

What @sandro said: either remove all even a bit load-causing open interfaces, or be down as long as the scrapers try to scrape. E.g. no normal site visitor would ever try to diff every version of a file in some web accessible repo against every other version; they'd diff once or twice, not in short succession, and would be done.
I've given up in a few cases and made things non-public so at least plain html can stay available.

@spz @sandro I sympathize, and I certainly let any admin choose what works for them. (I do think behavioral defenses and reputation scores have a better chance there than, say, static client fingerprinting.). I'm just pondering that it's a losing battle that negatively impacts the open web.

(There are analogies to how spam protections and the rules set by the big handful of providers nowadays make running a mail server so much harder.)

What would be practical would be not to filter the visitors much (except IP address range bans, which are dirt cheap, comparably), but to filter on load: load gets too high, visitors only get 418 (or if less peeved, 503) telling you that AI scrapers suck and to come back when the stampede has left for other victims.
Do you (or anyone else) know if there's an Apache module to do that?