Scraping an xls file from javascript (web page) in R -
i attempting create r script details how acquire info using analysis in r reproducability reasons. first step simple assigning url xls file variable in r , proceeding there, website scraping seems produce xls files via javascript (a language have no knowledge of).
follow these steps illustration xls:
go http://hcupnet.ahrq.gov/hcupnet.jsp?id=b08f84a071883804&form=seldxpr&js=y&action=%3e%3enext%3e%3e&_dxpr=dx1 click "principal diagnosis" type "599.0" in text box (without quotation marks) , leave radio button "each code separately" checked click "next" on page check of radio buttons click "next" on page check of radio buttons click "next"on page should see of data, links. 1 of these links titled "save results excel spreadsheet". clicking on link download xls file info computer.
i've inspected element , can see querying database, i'm not exclusively sure how query r script pull xls file down.
any help much appreciated.
(not technically full-on answer, perhaps, comment box doesn't allow real formatting)
rselenium can perform actions. however, there many different selections/combinations of options? if not, build list of urls one:
http://hcupnet.ahrq.gov/hcupnet.jsp?parms=h4siaaaaaaaaacwjpq.cmbgg_5kymdachstgbgut16fvtcrxoos_nybc5ab766evfdcu52vnonsvmlvwvfcnficqw9rxo0wedigsrgwk3ppknq0ttnb72bnpp7yx7jzyy6ydzbkdsuxnr2gaaaa6d4e19a7096c3ae9fd48005a5b0802a684bbbeb8
which goes right page each download. can capture url hitting esc instead of downloading xls file , copying url location bar.
on page can utilize xml
library or rvest
ingest , extract onclick
attribute next tag:
<a href="javascript:void(0)" onclick="window.open('hcupnet.xls?id=0a8c3e07cd01b562&form=disptab&js=&action=%3e%3enext%3e%3e&__indisptab=yes&_results=save&_results3=&sortopt=');"> <img height="19" src="arrow_off3.gif" alt="" align="absmiddle" width="15" border="0"> email link page</a>
(i included total anchor reference you'll need utilize in xpath or css selector find tag, might able away doing xpath or css "contains" hcupnet.xls
in onclick
attribute, too).
then, extract hcup…
string there , prepend http://hcupnet.ahrq.gov/
in download.file
call.
javascript r
No comments:
Post a Comment