Get data from Webpage that looks like Javascript into Java? -
i trying scrape info http://www.futbol24.com/live/?__igp=1&livedate=20141104 , time, home team , away team each match on page.
i have tried using jsoup - realise page seems load javascript after page loads... there way still data?
cheers rob
you can't jsoup.
you can seek selenium and/or:
phantomjs:
http://phantomjs.org/
and pjscrape:
http://nrabinowitz.github.io/pjscrape/
for illustration phantomjs can scrape with:
var page = require('webpage').create(); var fs = require('fs');// file scheme module var args = system.args; var output = './temp_htmls/test1.html'; // path saving local file page.open('http://www.futbol24.com/live/?__igp=1&livedate=20141104;rpp=50;po=0;dct=ps;d=osha-2013-0020', function() { // open file fs.write(output,page.content,'w'); // write page local file using page.content phantom.exit(); // exit phantomjs });
here have opened page using phantomjs , saved locally. after can utilize jsoup or beautiful soup scrape it.
good luck!
java javascript web-scraping
No comments:
Post a Comment