Mouseroot, The stack: ScraperWiki

Scraping is considered by most a devious or malicious art that is only for those who wish to steal data, However this is not true, the problem is most websites offer great data however lack a real nice way of getting just the data. Some of the websites offer an api but most dont so what can you do?

Well you could manually visit each website and extract the data yourself...OR you could write a program todo it for you maybe in something like pthon or maybe php even javascript and that would be all fine and dandy OR you could have the cloud scrape for you...Enter Scraperwiki.

One of the best scraper services on the web allowing you to leverage python, ruby and even php to write scrapers that you can manage and run when you need it or let it run daily so you have the latest data

Either way you spin it if you want to scrape some data and want a great online

service to manage it look no further then Scraperwiki

Want a small example?

The only thing you need to worry about is choosing which library your going to use

for python you can choose from

lxml
html5lib
Beautiful Soup
Beautiful Soup v4
..so many more

and ruby offers

Nokogiri
Hpricot
LibXML
mechanize
...a few more

and php offers only a few

all languages implement a library called Scraperwiki that expose a number of functions that will also help you.

Ill admit when I started writing this post I only knew of a few of these features and they have alot more...more then I can explain just head on over and start grabbing data today.

Mouseroot, The stack

Wednesday, August 22, 2012

ScraperWiki

No comments:

Post a Comment