Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A lot of sites do some variation of these when you set googlebot as your UA, certainly the larger more sophisticated sites do.

https://support.google.com/webmasters/answer/80553?hl=en

So unless you have a google domain your sol, it's also just generally frowned upon. We have our own UA WhizeBot with an email contact so you can let us know if our crawler is doing anything you'd rather it not.

There have been a few legal cases that protect scraping publicly available information on the web but we'd rather follow robots.txt to avoid the potential for shenanigans in any case.



wrt legal cases, are you referring to HiQ?


I am




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: