I heard that some e-commerce sites will not block scrappers, but poison the data shown to them (e.g. subtly wrong prices). Does anyone know more about this?
I never poisoned data, but I have implemented systems where clients who made requests too quickly got served data from a snapshot that only updated every 15 minutes.
This HN post had me playing around with Key Food's website. A lot of information is wrapped up in a cookie, but it looks like there isn't too much javascript rendering.
But when I hit the URLs with curl, without a cookie, I get a valid looking page, but it's just a hundred listings for "Baby Bok Choy." Maybe a test page?
After a little more fiddling, the server just responded with an empty response body. So, it looks like I'll have to use browser automation.
Yeah, by far the most reliable way of preventing bots is to silently poison the data. The harder you try to fight them in a visible fashion, the harder they become to detect. If you block them, they just come back with a hundred times as many IP addresses and u-a fingerprints.