This has me ridiculously curious now. Is that common? Other than a random sampli...

benfrederickson · on July 5, 2018

I analyzed the top 1 million robots.txt files looking for sites that allow google and block everyone else here: https://www.benfrederickson.com/robots-txt-analysis/ - it's a relatively common pattern for major websites

_trampeltier · on July 5, 2018

I did run a Yacy web crawler (P2P websearch https://yacy.net) a while ago. As far I remember I just saw Yandex for a few times disallowed in the robots.txt when I had trouble with crawling a site. Mostly I just got an empty website for my Yacy crawler instead the "real" Website.