Jan Schaumann

@jschauma@mstdn.social

Yes, bots being >50% of all traffic is bonkers, but at the same time I'm increasingly becoming convinced that bot defenses are (largely? equally? also?) harmful to the overall ecosystem.

Everything bot detection relies on is basically ossification: TLS fingerprinting, protocol offering and preference, HTTP header presence and ordering, ...

Assuming / enforcing those is overall bad for the web, and all browsers and clients should really grease all of that.

August 19, 2025 at 2:53:17 PM

and the ones with the biggest muscles circumvent all of those filters easily anyway while it blocks the low volume little guys that nobody actually wanted to block

Yeah. I mean, there's _some_ value in trivial defenses to somewhat tone down the noise (akin to e.g., run sshd on a different port than 22, just to avoid the very noisy trivial scanners), but it's never going to be effective against dedicated actors.

in real question: what's the alternative if your service is down because of resource exhaustion because of bot scrape spam?

It's not >50%, it's >98% and still getting worse.

What @sandro said: either remove all even a bit load-causing open interfaces, or be down as long as the scrapers try to scrape. E.g. no normal site visitor would ever try to diff every version of a file in some web accessible repo against every other version; they'd diff once or twice, not in short succession, and would be done.
I've given up in a few cases and made things non-public so at least plain html can stay available.

@spz @sandro I sympathize, and I certainly let any admin choose what works for them. (I do think behavioral defenses and reputation scores have a better chance there than, say, static client fingerprinting.). I'm just pondering that it's a losing battle that negatively impacts the open web.

(There are analogies to how spam protections and the rules set by the big handful of providers nowadays make running a mail server so much harder.)

What would be practical would be not to filter the visitors much (except IP address range bans, which are dirt cheap, comparably), but to filter on load: load gets too high, visitors only get 418 (or if less peeved, 503) telling you that AI scrapers suck and to come back when the stampede has left for other victims.
Do you (or anyone else) know if there's an Apache module to do that?

Elk Logo

Welcome to Elk!

Elk is a nimble Mastodon web client. You can login to your Mastodon account and use it to interact with the fediverse.

Expect some bugs and missing features here and there. Elk is Open Source and we're actively improving it as a community project. Join us and let's build it together!

If you'd like to report a bug, help us testing, give feedback, or contribute, reach out to us on GitHub and get involved.

To boost development, you can sponsor the Team through GitHub Sponsors. We hope you enjoy Elk!

Anthony FuTAKAHASHI ShuujiPatakJoaquín SánchezDaniel Roe三咲智子 Kevin Deng

The Elk Team