regex - Fuzzy, but not too fuzzy string matching with agrep -
i have string this:
text <- c("car", "ca-r", "my car", "i drive cars", "chars", "cancan")
i match pattern matched 1 time , max. 1 substitution/insertion. result should this:
> "car"
i tried next match pattern 1 time max. substitution/insertion etc , following:
> agrep("ca?", text, ignore.case = t, max = list(substitutions = 1, insertions = 1, deletions = 1, = 1), value = t) [1] "car" "ca-r" "my car" "i drive cars" "cancan"
is there way exclude strings n-characters longer pattern?
an alternative replaces agrep
adist
:
text[which(adist("ca?", text, ignore.case=true) <= 1)]
adist
gives number of insertions/deletions/substitutions required convert 1 string another, keeping elements adist of equal or less 1 should give want, think.
this reply less appropriate if want exclude things "n-characters longer" pattern (with n beingness variable), rather match whole words (where n 1 in example).
regex r
No comments:
Post a Comment