Thursday, 15 August 2013

python - Integrating Selenium and Scrapy to click past page and then save cookies -



python - Integrating Selenium and Scrapy to click past page and then save cookies -

i've been searching on stackoverflow couple hours , still haven't been able find suitable reply doing. want utilize selenium past initial page click through , transfer cookies scrapy crawl database. far maintain on getting redirected initial login page.

i based off grabbing cookies , putting them in request off of reply scrapy authentication login cookies

class hooverstest(scrapy.spider): global starturls name = "hooverstest" allowed_domains = ["http://subscriber.hoovers.com"] login_page = ["http://subscriber.hoovers.com/h/home/index.html"] start_urls = ["http://subscriber.hoovers.com/h/company360/overview.html?companyid=99566395", "http://subscriber.hoovers.com/h/company360/overview.html?companyid=10723000000000"] def login(self, response): homecoming request(url=self.login_page, cookies=self.get_cookies(), callback=self.after_login) def get_cookies(self): self.driver = webdriver.firefox() self.driver.get("http://www.mergentonline.com/hoovers/continue.php?status=sucess") elem = self.driver.find_element_by_name("continue") elem.click() time.sleep(15) cookies = self.driver.get_cookies() #reduce(lambda r, d: r.update(d) or r, cookies, {}) self.driver.close() homecoming cookies def parse(self, response): homecoming request(url="http://subscriber.hoovers.com/h/company360/overview.html?companyid=99566395", cookies=self.get_cookies(), callback=self.after_login) def after_login(self, response): hxs = htmlxpathselector(response) print hxs.select('//title').extract()

python selenium cookies web-scraping scrapy

No comments:

Post a Comment