Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is that even possible? I don't know the latest size of the IA, but it must be ridiculously huge by now, (1 billion pages a week added) bandwidth cost would be massive.

Maybe they could offer a mail-us-a-multi-petabyte-hdd service... Returned a few weeks later full of data :)



It's totally possible, they already have the infrastructure in place and 14PB of data available for download. Unfortunately the Wayback Machine data is not currently exposed publicly.


Why do you think that is? It seems like they are really open with most of their stuff, so why haven't they exposed the wayback with an api?

Then again, wouldn't it be pretty trivial to scrape?

(I say this as I'm working on a hellish scraping project, and the wayback machine seems like it would be a walk in the park to scrape)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: