regex - What regular expression will match this (XML-ish) input pattern -
i had requirement need parse xml fragment looks this:
<tag name="books">books1</tag> <tag name="textbooks"> textbooks1</tag> <tag name="textbooks"> textbooks2</tag> <tag name="textbooks"> textbooks3</tag> <tag name="textbooks"> textbooks4</tag> <tag name="textbooks"> textbooks5</tag> <tag name="books">books2</tag> <tag name="textbooks"> textbooks1</tag> <tag name="textbooks"> textbooks2</tag> <tag name="books">books3</tag> <tag name="textbooks"> textbooks4</tag> <tag name="textbooks"> textbooks5</tag> i need tags name="textbooks" including <tag name="books"></tag> lastly textbooks before <tag name="books"></tag>.
so results follows
<tag name="books">books1</tag> <tag name="textbooks"> textbooks1</tag> <tag name="textbooks"> textbooks2</tag> <tag name="textbooks"> textbooks3</tag> <tag name="textbooks"> textbooks4</tag> <tag name="textbooks"> textbooks5</tag> <tag name="books">books2</tag> <tag name="textbooks"> textbooks1</tag> <tag name="textbooks"> textbooks2</tag> <tag name="books">books3</tag> <tag name="textbooks"> textbooks4</tag> <tag name="textbooks"> textbooks5</tag>
if question nil more "which regular look match <tag name="books">" reply <tag name="books">.
your output illustration looks want insert empty line before each occurrence except first, maybe seek like
sed '1b;/<tag name="books">/i\ ' xml-fragment.txt if mean, capture each grouping of name="textbooks" tags along preceding name="books" tag , respective contents, seek like
(<tag name="books">[^<>]*(?:</tag>\s*<tag name="textbooks">[^<>]*)*</tag>) where \s matches whitespace (including newlines) in regex implementations include perl extensions (so, not sed, modern programming languages, including php [which include here witty remark suitability ... things] , python).
note many regex implementations line-oriented default -- applying above multi-line regular look input single line not work. assuming doing like
lines = file.read() re.match(regex, lines) : you should find want.
like indicated in comments, should utilize xml tools xml input. if input isn't proper xml, maybe can preprocess is, , postprocess remove whatever preprocessor had add together in order create acceptable xml processing pipeline.
regex
No comments:
Post a Comment