Thursday, 15 August 2013

python - Crawl two different pages separately with scrapy -



python - Crawl two different pages separately with scrapy -

i need crawl 2 urls same spider: example.com/folder/ , example.com/folder/fold2 , retrieve 2 different things each url.

start_urls = ['http://www.example.com/folder', 'http://www.example.com/folder/fold2']

1) check /folder 2) check different /folder/fold2

looks want override start_requests method instead of using start_urls:

from scrapy import spider, request class myspider(spider): name = 'myspider' def start_requests(self): yield request('http://www.example.com/folder', callback=self.parse_folder) yield request('http://www.example.com/folder/fold2', callback=self.parse_subfolder) # ... define parse_folder , parse_subfolder here

python scrapy

No comments:

Post a Comment