Tuesday, 15 April 2014

Get data from Webpage that looks like Javascript into Java? -



Get data from Webpage that looks like Javascript into Java? -

i trying scrape info http://www.futbol24.com/live/?__igp=1&livedate=20141104 , time, home team , away team each match on page.

i have tried using jsoup - realise page seems load javascript after page loads... there way still data?

cheers rob

you can't jsoup.

you can seek selenium and/or:

phantomjs:

http://phantomjs.org/

and pjscrape:

http://nrabinowitz.github.io/pjscrape/

for illustration phantomjs can scrape with:

var page = require('webpage').create(); var fs = require('fs');// file scheme module var args = system.args; var output = './temp_htmls/test1.html'; // path saving local file page.open('http://www.futbol24.com/live/?__igp=1&livedate=20141104;rpp=50;po=0;dct=ps;d=osha-2013-0020', function() { // open file fs.write(output,page.content,'w'); // write page local file using page.content phantom.exit(); // exit phantomjs });

here have opened page using phantomjs , saved locally. after can utilize jsoup or beautiful soup scrape it.

good luck!

java javascript web-scraping

No comments:

Post a Comment