Saturday, 15 February 2014

python regex replace only part of NOT match -



python regex replace only part of NOT match -

i have many html codes have <pre> python code </pre>, next

html code:

<pre class="c1"> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python regex<br> <pre class="c2"> # regex usage import re re.findall(r'abc','abcde') </pre>

i regard regex keyword, , replace link: <a href="link-to-regex">regex</a>,but don't want replace contents in label <pre>

output:

<pre class="c1"> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python <a href="link-to-regex">regex</a><br> <pre class="c2"> # regex usage import re re.findall(r'abc','abcde') </pre>

i utilize placeholders

pre_list = re.compile(r'(<pre>.+?</pre>)').findall(html_code) # utilize code_placehoder protect code sources index,code in enumerate(pre_list): html_code = html_code.replace(code, 'code_placeholder_{}'.format(index)) # replace html content here html_code = html_code.replace('regex', '<a href="link-to-regex">regex</a>') index,code in enumerate(pre_list): html_code = html_code.replace('code_placeholder_{}'.format(index), code) come in code here

better method this?

use positive lookaround assertions match string regex not nowadays within <pre> tag. , don't forget enable dotall modifier.

>>> import re >>> s = """<pre> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python regex<br> <pre> # regex usage import re re.findall(r'abc','abcde') </pre>""" >>> m = re.sub(r'(?s)regex(?!(?:(?!<\/?pre[^<>]*>).)*<\/pre>)', r'<a href="link-to-regex">regex</a>', s) >>> print m <pre> # regex usage import re re.findall(r'abc','abcde') </pre> python tutorial ...python <a href="link-to-regex">regex</a><br> <pre> # regex usage import re re.findall(r'abc','abcde') </pre>

demo

python regex

No comments:

Post a Comment