Thursday, 15 January 2015

python - Extract an attribute value , Lxml -



python - Extract an attribute value , Lxml -

i have next xml file:

'<?xml version="1.0" encoding="utf-8" standalone="yes"?>\r\n<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingcanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officedocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officedocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingdrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingdrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessinggroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingink" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingshape" mc:ignorable="w14 wp14"><w:body><w:p w:rsidr="00706a37" w:rsidrpr="004a1ce5" w:rsidrdefault="004a1ce5"><w:ppr><w:pstyle w:val="heading1"/><w:numpr><w:ilvl w:val="12"/><w:numid w:val="0"/></w:numpr><w:rpr><w:sz w:val="28"/><w:szcs w:val="28"/></w:rpr></w:ppr><w:commentrangestart w:id="0"/><w:r w:rsidrpr="004a1ce5"><w:rpr><w:sz w:val="28"/><w:szcs w:val="28"/></w:rpr><w:t>h</w:t></w:r><w:commentrangeend w:id="0"/><w:r w:rsidr="00a23794"><w:rpr><w:rstyle w:val="commentreference"/>

and need extract value of id within <w:commentrangestart> tag . have looked on many questions on , found next type:

i tried: (iterate on every p commentrangestart tag , , retrieve attrib. returned nothing.

for p in lxml_tree.xpath('.//w:p/commentrangestart',namespaces = {'w':w}): print p.attrib

i tried various combinations 'commentrangestart[@id]' , commentrangestart/@id none worked. referred many questions , 1 of them here . prefer way in go on every p , search comment tag. like:

for p in lxml_tree.xpath('.//w:p',namespaces = {'w':w}): p.xpath(./w:commentrangestart/...)

and on..

what's wrong expression.??

you need qualify namespace:

for p in root.xpath('.//w:p/w:commentrangestart', namespaces={'w':w}): print p.attrib

output:

{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id': '0'}

alternative:

for id_ in root.xpath('.//w:p/w:commentrangestart/@w:id', namespaces={'w': w}): print id_

output:

0

python xml python-2.7 xpath lxml

No comments:

Post a Comment