scraping data from a dynamic graph using python+beautifulSoup4 -
i need implement info scraping task , extract info dynamic graph. graph update time similar find if @ graph of company's stock. using requests , beautifulsoup4 library in python have figured out how scrape text , links data. can't seem figure out how can values of graph csv file
the graph in question can found @ - http://www.apptrace.com/app/instagram/id389801252/ranks/topfreeapplications/36
@oliver w. provided reply already, using requests
(link here) avoids having note network phone call , overall much nicer bundle urllib
.
if wanna bit more flexible code, can write function takes country name , start , end date.
import requests import pandas pd import json def load_data(country='', start_date='2014-08-09', end_date='2014-11-1'): base of operations = "http://www.apptrace.com/api/app/389801252/rankings/country/" = "?country={0}&start_date={1}&end_date={2}&device=iphone&list_type=normal&chart_subtype=iphone" addr = base of operations + extra.format(country, start_date, end_date) page = requests.get(addr) json_data = page.json() #gets json info page ranks = json_data['rankings'][0]['ranks'] ranks = json.dumps(ranks) #ensures has valid json format df = pd.read_json(ranks, orient='records') homecoming df
change things in webpage see other values can country (canada 'can' example). empty string usa.
the df looks this
date rank 0 2014-08-09 10 1 2014-08-10 10 2 2014-08-11 9 3 2014-08-12 8 4 2014-08-13 8 5 2014-08-14 7 6 2014-08-15 6 7 2014-08-16 8
with pandas dataframe in hand, can export csv
or combine many dataframes before export
df = load_data() df.to_csv("file_name.csv")
python graph web-scraping beautifulsoup python-requests
No comments:
Post a Comment