python - Crawl two different pages separately with scrapy -
i need crawl 2 urls same spider: example.com/folder/ , example.com/folder/fold2 , retrieve 2 different things each url.
start_urls = ['http://www.example.com/folder', 'http://www.example.com/folder/fold2']
1) check /folder 2) check different /folder/fold2
looks want override start_requests method instead of using start_urls:
from scrapy import spider, request class myspider(spider): name = 'myspider' def start_requests(self): yield request('http://www.example.com/folder', callback=self.parse_folder) yield request('http://www.example.com/folder/fold2', callback=self.parse_subfolder) # ... define parse_folder , parse_subfolder here
python scrapy
No comments:
Post a Comment