Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Scrape.it – Change-Resilient Web Scraper (scrape.it)
16 points by notastartup on Nov 5, 2014 | hide | past | favorite | 16 comments


We released a similar open source tool for visual scraping, earlier this year, called Portia: https://github.com/scrapinghub/portia

It's been getting quite a bit of traction and we're currently working on the integration with Scrapinghub platform (disclaimer: I work there) for those who prefer a hosted version.


For simple web scraping, I find that kimonolabs.com does a perfectly fine job.


I love what they are doing with kimono and import.io

There's no free lunch lets put it that way. It's free, it's simple but limiting for doing anything heavier. It covers only a small portion of the websites. You can't crawl all the links in a website and hard to scrape data from dynamic webpages etc. Also I found that some websites wouldn't even load making it impossible to define the fields to scrape.


I tend to stick with KimonoLabs, but I did try http://parsehub.com for a while. It's a lot more complicated, but it allows you to scrape dynamic sites.


Excellent choices. We try to take the pros and cons from each of those services and make it better. Basically, Scrape.it aims to be simple to use like Kimono and able to handle complex websites as well.

You can scrape websites like Kayak and Airbnb as well by following the Scrape.it Tool.

We also have a dedicated number of hours every month to create the jobs for you so you could just tell us the websites you want to scrape.

We then monitor the jobs so that they will continue working without interruptions (ip throttle, website layout changes).


I can imagine some scenarios where you would use a web scraper, but I'm curious; What are people actually using a web scraper for? Does anyone have one in production?


I use one to scrape session times from local theatre websites. They don't have the capacity to build an API so I have an agreement with them that they keep the formats the same. I've set up a script which alerts them if they've screwed it up.


I used to run one that I made for displaying a nice view of all the art posted on conceptart.org (a very large message board which I found cumbersome to navigate).


My product MyShopData.com uses web scraping for retailers to extract their own data and integrate with marketplaces.


Holy crap that's expensive!


This is ripe for resellers to move in and make some dough.


Can you describe what you mean by resellers?


One can open an account with scrape.it, pay for the enterprise option, and then share login credentials with other people for a small fee. For example, an enterprise account costs $899/month. You could "resell" it as described here to 100 people for $9/month each, and make $1/month in profit.


There's no metering (https://scrape.it/tour2) so you can create as many web scrapes as you want. Scrape.it constantly monitors each job to make sure the data extraction doesn't get interrupted when website changes. Should the website change, your jobs get updated automatically to continue working.


It's an impressive product, there's no doubt about it. I just happened to be looking for a WYSIWYG like tool for scraping a page for changes. This seems more like a commercial grade product than a home user hobby haha nice site btw!


Thank you so much. We are planning on a lite free version for the average individual early 2015.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: