My Blog: python - Integrating Selenium and Scrapy to click past page and then save cookies -

Thursday, 15 August 2013

python - Integrating Selenium and Scrapy to click past page and then save cookies -

i've been searching on stackoverflow couple hours , still haven't been able find suitable reply doing. want utilize selenium past initial page click through , transfer cookies scrapy crawl database. far maintain on getting redirected initial login page.

i based off grabbing cookies , putting them in request off of reply scrapy authentication login cookies

class hooverstest(scrapy.spider):     global starturls     name = "hooverstest"     allowed_domains = ["http://subscriber.hoovers.com"]     login_page = ["http://subscriber.hoovers.com/h/home/index.html"]     start_urls = ["http://subscriber.hoovers.com/h/company360/overview.html?companyid=99566395",                "http://subscriber.hoovers.com/h/company360/overview.html?companyid=10723000000000"]    def login(self, response):      homecoming request(url=self.login_page,         cookies=self.get_cookies(), callback=self.after_login)  def get_cookies(self):     self.driver = webdriver.firefox()     self.driver.get("http://www.mergentonline.com/hoovers/continue.php?status=sucess")     elem = self.driver.find_element_by_name("continue")     elem.click()     time.sleep(15)     cookies = self.driver.get_cookies()     #reduce(lambda r, d: r.update(d) or r, cookies, {})     self.driver.close()      homecoming cookies  def parse(self, response):      homecoming request(url="http://subscriber.hoovers.com/h/company360/overview.html?companyid=99566395",         cookies=self.get_cookies(), callback=self.after_login)   def after_login(self, response):     hxs = htmlxpathselector(response)     print hxs.select('//title').extract()

python selenium cookies web-scraping scrapy

My Blog

Thursday, 15 August 2013

python - Integrating Selenium and Scrapy to click past page and then save cookies -

No comments:

Post a Comment