Tuesday, 15 January 2013

regex - Fuzzy, but not too fuzzy string matching with agrep -



regex - Fuzzy, but not too fuzzy string matching with agrep -

i have string this:

text <- c("car", "ca-r", "my car", "i drive cars", "chars", "cancan")

i match pattern matched 1 time , max. 1 substitution/insertion. result should this:

> "car"

i tried next match pattern 1 time max. substitution/insertion etc , following:

> agrep("ca?", text, ignore.case = t, max = list(substitutions = 1, insertions = 1, deletions = 1, = 1), value = t) [1] "car" "ca-r" "my car" "i drive cars" "cancan"

is there way exclude strings n-characters longer pattern?

an alternative replaces agrep adist:

text[which(adist("ca?", text, ignore.case=true) <= 1)]

adist gives number of insertions/deletions/substitutions required convert 1 string another, keeping elements adist of equal or less 1 should give want, think.

this reply less appropriate if want exclude things "n-characters longer" pattern (with n beingness variable), rather match whole words (where n 1 in example).

regex r

No comments:

Post a Comment