python - Extract an attribute value , Lxml -
i have next xml file:
'<?xml version="1.0" encoding="utf-8" standalone="yes"?>\r\n<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingcanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officedocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officedocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingdrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingdrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessinggroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingink" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingshape" mc:ignorable="w14 wp14"><w:body><w:p w:rsidr="00706a37" w:rsidrpr="004a1ce5" w:rsidrdefault="004a1ce5"><w:ppr><w:pstyle w:val="heading1"/><w:numpr><w:ilvl w:val="12"/><w:numid w:val="0"/></w:numpr><w:rpr><w:sz w:val="28"/><w:szcs w:val="28"/></w:rpr></w:ppr><w:commentrangestart w:id="0"/><w:r w:rsidrpr="004a1ce5"><w:rpr><w:sz w:val="28"/><w:szcs w:val="28"/></w:rpr><w:t>h</w:t></w:r><w:commentrangeend w:id="0"/><w:r w:rsidr="00a23794"><w:rpr><w:rstyle w:val="commentreference"/>
and need extract value of id
within <w:commentrangestart>
tag . have looked on many questions on , found next type:
i tried: (iterate on every p commentrangestart tag , , retrieve attrib. returned nothing.
for p in lxml_tree.xpath('.//w:p/commentrangestart',namespaces = {'w':w}): print p.attrib
i tried various combinations 'commentrangestart[@id]'
, commentrangestart/@id
none worked. referred many questions , 1 of them here . prefer way in go on every p , search comment tag. like:
for p in lxml_tree.xpath('.//w:p',namespaces = {'w':w}): p.xpath(./w:commentrangestart/...)
and on..
what's wrong expression.??
you need qualify namespace:
for p in root.xpath('.//w:p/w:commentrangestart', namespaces={'w':w}): print p.attrib
output:
{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id': '0'}
alternative:
for id_ in root.xpath('.//w:p/w:commentrangestart/@w:id', namespaces={'w': w}): print id_
output:
0
python xml python-2.7 xpath lxml
No comments:
Post a Comment