python regex replace only part of NOT match -
i have many html codes have <pre> python code </pre>
, next
html code:
<pre class="c1"> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python regex<br> <pre class="c2"> # regex usage import re re.findall(r'abc','abcde') </pre>
i regard regex
keyword, , replace link: <a href="link-to-regex">regex</a>
,but don't want replace contents in label <pre>
output:
<pre class="c1"> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python <a href="link-to-regex">regex</a><br> <pre class="c2"> # regex usage import re re.findall(r'abc','abcde') </pre>
i utilize placeholders
pre_list = re.compile(r'(<pre>.+?</pre>)').findall(html_code) # utilize code_placehoder protect code sources index,code in enumerate(pre_list): html_code = html_code.replace(code, 'code_placeholder_{}'.format(index)) # replace html content here html_code = html_code.replace('regex', '<a href="link-to-regex">regex</a>') index,code in enumerate(pre_list): html_code = html_code.replace('code_placeholder_{}'.format(index), code) come in code here
better method this?
use positive lookaround assertions match string regex
not nowadays within <pre>
tag. , don't forget enable dotall modifier.
>>> import re >>> s = """<pre> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python regex<br> <pre> # regex usage import re re.findall(r'abc','abcde') </pre>""" >>> m = re.sub(r'(?s)regex(?!(?:(?!<\/?pre[^<>]*>).)*<\/pre>)', r'<a href="link-to-regex">regex</a>', s) >>> print m <pre> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python <a href="link-to-regex">regex</a><br> <pre> # regex usage import re re.findall(r'abc','abcde') </pre>
demo
python regex
No comments:
Post a Comment