Contrary to what most commenters assume, the high bandwidth usage is not coming ...

mschuster91 · 2025-05-02T13:32:27 1746192747

There's two distinct problems caused by AI scrapers:

1. Bandwidth consumption - that's on scrapers downloading multimedia files

2. CPU resource exhaustion - AI scrapers don't take contextual clues into account. They just blindly follow each and every link they can find, which means that they hit a lot of pages that aren't cached but re-generated for each call. That's stuff like the article history but especially the version delta pages. These are very expensive to generate and are so rarely called that it doesn't make sense to cache them.

alt227 · 2025-05-02T13:22:49 1746192169

I havent read the article, but why dont they just put it behind a free login with bandwidth restrictions per day or something?

cubefox · 2025-05-02T13:28:26 1746192506

You want images only be available to users with a Wikipedia login? This would mean by far most people would no longer see images in Wikipedia articles.

alt227 · 2025-05-02T13:39:01 1746193141

No, I am saying what a lot of other people are. Force bots into API access, which can then be authenticated and restricted by bandwidth or calls per day. Then block bot access to html pages. Nobody looses their images, and bots are limited in stealing bandwidth.

joepie91_ · 2025-05-02T13:54:59 1746194099

Have you actually tried blocking these scraper bots? The whole problem is that if you do, they start impersonating normal browsers from residential IPs instead. They actively evade countermeasures.

alt227 · 2025-05-07T12:45:40 1746621940

Isnt everything measures and countermeasures though?

As far as I am aware there is no such thing as a silver bullet anywhere when it comes to security.

Its like moving your SSH port from port 22 to some other random one. Will it stop advanced scripts from scanning your server and finding it? No, but it sure as hell will cut down the noise of unsophisticated connections which means you can focus on the more tough ones.

dmitrygr · 2025-05-02T22:05:04 1746223504

Finally, a use for CFAA?

SoftTalker · 2025-05-02T13:37:01 1746193021

OK, but what's the downside?