Tuesday, 15 September 2015

python 2.7 - How to store data crawled in scrapy in specfic order? -



python 2.7 - How to store data crawled in scrapy in specfic order? -

i have crawl info web page in specfic order liked declared fields in item class , have set them in csv file.problem occuring there stores info not in specfic order scrapping info of field , putting in csv file want should store info declared in item class. newbie in python. can tell me how

for ex: item class class dmozitem(item): title = field() link = field() desc = field()

now when storing info in csv file storing first desc ,link , title "desc": [], "link": ["/computers/programming/"], "title": ["programming"]}

the reason order of info in csv file not declared because item dict info type. order of keys in dict decided alphabet order. logic of export items csv file implemented in

scrapy\contrib\exporter__init__.py

you can rewrite _get_serialized_fields method of baseitemexporter allow yield key-value pair in order of declaration. here illustration code

field_iter = ['title', 'link', 'desc'] field_name in field_iter: if field_name in item: field = item.fields[field_name] value = self.serialize_field(field, field_name, item[field_name]) else: value = default_value yield field_name, value

but remember, not universal solution.

python-2.7 scrapy web-crawler

No comments:

Post a Comment