regex - Capturing ids by xpath in python from url source -
imagine have content like:
cont="""<a id="test1" class="ssss" title="dddd" href="aaaa">example1</a>.....<a id="test2" class="gggg" title="zzzz" href="vvvv">example2</a>.... """
what want:
id1='test1' id2='test2' idn='testn'
could right me?
if '<a id=' in cont: ....?
do have utilize regex in python or there method xpath grab them?
note: want ids in tag a
download bs4 here: http://www.crummy.com/software/beautifulsoup/
documentation: http://www.crummy.com/software/beautifulsoup/bs4/doc/
this should work:
from bs4 import beautifulsoup soup = beautifulsoup(cont) in soup.select('a'): # or soup.find_all('a') if prefer if a.get('id') not none: print a.get('id')
or comprehension list:
ids = [a.get('id') in beautifulsoup(cont).select('a') if a.get('id') not none]
python regex xpath beautifulsoup
No comments:
Post a Comment