I heard that some e-commerce sites will not block scrappers, but poison the data...

barryrandall · on Aug 6, 2024

I never poisoned data, but I have implemented systems where clients who made requests too quickly got served data from a snapshot that only updated every 15 minutes.

MathMonkeyMan · on Aug 7, 2024

This HN post had me playing around with Key Food's website. A lot of information is wrapped up in a cookie, but it looks like there isn't too much javascript rendering.

But when I hit the URLs with curl, without a cookie, I get a valid looking page, but it's just a hundred listings for "Baby Bok Choy." Maybe a test page?

After a little more fiddling, the server just responded with an empty response body. So, it looks like I'll have to use browser automation.

marginalia_nu · on Aug 7, 2024

Yeah, by far the most reliable way of preventing bots is to silently poison the data. The harder you try to fight them in a visible fashion, the harder they become to detect. If you block them, they just come back with a hundred times as many IP addresses and u-a fingerprints.