python regex replace only part of NOT match -
i have many html codes have <pre> python code </pre>, next
html code:
<pre class="c1"> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python regex<br> <pre class="c2"> # regex usage import re re.findall(r'abc','abcde') </pre> i regard regex keyword, , replace link: <a href="link-to-regex">regex</a>,but don't want replace contents in label <pre>
output:
<pre class="c1"> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python <a href="link-to-regex">regex</a><br> <pre class="c2"> # regex usage import re re.findall(r'abc','abcde') </pre> i utilize placeholders
pre_list = re.compile(r'(<pre>.+?</pre>)').findall(html_code) # utilize code_placehoder protect code sources index,code in enumerate(pre_list): html_code = html_code.replace(code, 'code_placeholder_{}'.format(index)) # replace html content here html_code = html_code.replace('regex', '<a href="link-to-regex">regex</a>') index,code in enumerate(pre_list): html_code = html_code.replace('code_placeholder_{}'.format(index), code) come in code here better method this?
use positive lookaround assertions match string regex not nowadays within <pre> tag. , don't forget enable dotall modifier.
>>> import re >>> s = """<pre> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python regex<br> <pre> # regex usage import re re.findall(r'abc','abcde') </pre>""" >>> m = re.sub(r'(?s)regex(?!(?:(?!<\/?pre[^<>]*>).)*<\/pre>)', r'<a href="link-to-regex">regex</a>', s) >>> print m <pre> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python <a href="link-to-regex">regex</a><br> <pre> # regex usage import re re.findall(r'abc','abcde') </pre> demo
python regex
No comments:
Post a Comment