Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wikipedia asks people not to crawl it. There are database dumps that you can instead import into your local MySQL and work from there.

https://en.wikipedia.org/wiki/Wikipedia:Database_download#Pl...



Wikipedia has no objection to crawling a couple thousand pages if you do so at a reasonable speed and set a user-agent with a contact email.

If you want to crawl millions of pages please use a dump.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: