Tuesday, 15 May 2012

Python/xpath get instances of text in arbitrary element -



Python/xpath get instances of text in arbitrary element -

given following:

<table> <tr> <td> <div>text 1</div> </td> <td> text 2 </td> <td> <div> <a href="#">text 3</a> </div> </td> </tr> <tr> ... </tr> </table>

given above table, how extract text? note number of nested elements arbitrary can't first sibling, zero-th sibling, , sec sibling.

i'm looking general way extract text.

in [1]: d="""<table> ...: <tr> ...: <td> ...: <div>text 1</div> ...: </td> ...: <td> ...: text 2 ...: </td> ...: <td> ...: <div> ...: <a href="#">text 3</a> ...: </div> ...: </td> ...: </tr> ...: <tr> ...: ... ...: </tr> ...: </table>""" in [3]: lxml import etree in [4]: f = etree.html(d) in [5]: f.xpath('normalize-space(string(/table))') out[5]: '' in [6]: f.xpath('normalize-space(string(//table))') out[6]: 'text 1 text 2 text 3 ...

i use:

normalize-space(string(/table))

python xpath

No comments:

Post a Comment