This has me ridiculously curious now. Is that common? Other than a random sampling of sites I go to, there a good way to get numbers on how often this is used?
Edit: In my scanning, I have to confess that wikipedia's robot file is the best. Fairly heavily commented on why the rules are there. https://en.wikipedia.org/robots.txt
I analyzed the top 1 million robots.txt files looking for sites that allow google and block everyone else here: https://www.benfrederickson.com/robots-txt-analysis/ - it's a relatively common pattern for major websites
I did run a Yacy web crawler (P2P websearch https://yacy.net) a while ago. As far I remember I just saw Yandex for a few times disallowed in the robots.txt when I had trouble with crawling a site. Mostly I just got an empty website for my Yacy crawler instead the "real" Website.
Edit: In my scanning, I have to confess that wikipedia's robot file is the best. Fairly heavily commented on why the rules are there. https://en.wikipedia.org/robots.txt