Thursday, 15 May 2014

regex - Capturing ids by xpath in python from url source -



regex - Capturing ids by xpath in python from url source -

imagine have content like:

cont="""<a id="test1" class="ssss" title="dddd" href="aaaa">example1</a>.....<a id="test2" class="gggg" title="zzzz" href="vvvv">example2</a>.... """

what want:

id1='test1' id2='test2' idn='testn'

could right me?

if '<a id=' in cont: ....?

do have utilize regex in python or there method xpath grab them?

note: want ids in tag a

download bs4 here: http://www.crummy.com/software/beautifulsoup/

documentation: http://www.crummy.com/software/beautifulsoup/bs4/doc/

this should work:

from bs4 import beautifulsoup soup = beautifulsoup(cont) in soup.select('a'): # or soup.find_all('a') if prefer if a.get('id') not none: print a.get('id')

or comprehension list:

ids = [a.get('id') in beautifulsoup(cont).select('a') if a.get('id') not none]

python regex xpath beautifulsoup

No comments:

Post a Comment