Scraping is considered by most a devious or malicious art that is only for those who wish to steal data, However this is not true, the problem is most websites offer great data however lack a real nice way of getting just the data. Some of the websites offer an api but most dont so what can you do?
Well you could manually visit each website and extract the data yourself...OR you could write a program todo it for you maybe in something like pthon or maybe php even javascript and that would be all fine and dandy OR you could have the cloud scrape for you...Enter Scraperwiki.
One of the best scraper services on the web allowing you to leverage python, ruby and even php to write scrapers that you can manage and run when you need it or let it run daily so you have the latest data
Either way you spin it if you want to scrape some data and want a great online
service to manage it look no further then Scraperwiki
Want a small example?
The only thing you need to worry about is choosing which library your going to use
for python you can choose from
- lxml
- html5lib
- Beautiful Soup
- Beautiful Soup v4
- ..so many more
and ruby offers
- Nokogiri
- Hpricot
- LibXML
- mechanize
- ...a few more
and php offers only a few
all languages implement a library called Scraperwiki that expose a number of functions that will also help you.
Ill admit when I started writing this post I only knew of a few of these features and they have alot more...more then I can explain just head on over and start grabbing data today.
No comments:
Post a Comment