Python/xpath get instances of text in arbitrary element -
given following:
<table> <tr> <td> <div>text 1</div> </td> <td> text 2 </td> <td> <div> <a href="#">text 3</a> </div> </td> </tr> <tr> ... </tr> </table>
given above table, how extract text? note number of nested elements arbitrary can't first sibling, zero-th sibling, , sec sibling.
i'm looking general way extract text.
in [1]: d="""<table> ...: <tr> ...: <td> ...: <div>text 1</div> ...: </td> ...: <td> ...: text 2 ...: </td> ...: <td> ...: <div> ...: <a href="#">text 3</a> ...: </div> ...: </td> ...: </tr> ...: <tr> ...: ... ...: </tr> ...: </table>""" in [3]: lxml import etree in [4]: f = etree.html(d) in [5]: f.xpath('normalize-space(string(/table))') out[5]: '' in [6]: f.xpath('normalize-space(string(//table))') out[6]: 'text 1 text 2 text 3 ...
i use:
normalize-space(string(/table))
python xpath
No comments:
Post a Comment