My Blog: Python/xpath get instances of text in arbitrary element -

Tuesday, 15 May 2012

Python/xpath get instances of text in arbitrary element -

given following:

<table>     <tr>         <td>             <div>text 1</div>         </td>         <td>             text 2         </td>         <td>             <div>                 <a href="#">text 3</a>             </div>         </td>     </tr>     <tr>         ...     </tr> </table>

given above table, how extract text? note number of nested elements arbitrary can't first sibling, zero-th sibling, , sec sibling.

i'm looking general way extract text.

in [1]: d="""<table>    ...:     <tr>    ...:         <td>    ...:             <div>text 1</div>    ...:         </td>    ...:         <td>    ...:             text 2    ...:         </td>    ...:         <td>    ...:             <div>    ...:                 <a href="#">text 3</a>    ...:             </div>    ...:         </td>    ...:     </tr>    ...:     <tr>    ...:         ...    ...:     </tr>    ...: </table>"""  in [3]: lxml import etree  in [4]: f = etree.html(d)  in [5]: f.xpath('normalize-space(string(/table))') out[5]: ''  in [6]: f.xpath('normalize-space(string(//table))') out[6]: 'text 1 text 2 text 3 ...

i use:

normalize-space(string(/table))

python xpath

My Blog

Tuesday, 15 May 2012

Python/xpath get instances of text in arbitrary element -

No comments:

Post a Comment