Wednesday, 15 May 2013

regex - Correct wrongly formatted dates -


I have some formatted dates that are between a good formatted date, something looks like this:

  df & lt; - data.frame (col = c ("- 1.1.11-01", "- 1.11.12-1", "- 1.1.13-01", "- 1.1.14-01" "- 1.10.10- 01 "," - 1.10.11-01 "" --- 1.10.12-01 "," 2010-03-31 "," 2010-04-01 "," 2010-04-05 "))  

How can I convert the wrong format between the current formatted date?

I am able to delete the first dash, but for this, the last 3 characters of -01 or -1 have to be removed. So the correct values ​​are:

  desired < - c ("1.1.11", "1.1.12", "1.1.13", "1.1.14", "1.10" 10 "," 1.10.11 "," 1.10.12 "," 2010-03- 31 "," 2010-04-01 "," 2010-04-05 "))  

What I'm doing is messing with part of -01 Because, by removing them, it will also remove a part of the correct formatted dates.

Edit: Format mm.dd.yy

A simple regexp will solve these types of problems very well:

  & gt; Df & lt ; - c ("- 1.1.11-01", "- 1.11.12-1", "- 1.1.13-01", "- 1.1.14-01", "- 1.10 0.10-01", "- 1.10.11-01 "" --- 1.10.12-01 "," 2010-03-31 "," 2010-04-01 "," 2010-04-05 ") & g T; df [1] "- -1.1.11-01" "- -1.11.12-1" "-1.1.13-01" "-1.1.14-01" "-1.10.10-01" " -1.10.11-01 "" --- 1.10.12-01 "[8]" 2010-03-31 "" 2010-04-01 "" 2010-04-05 "& gt; Df & lt; - sub (". * ([0-9] {4} \\ - [0- 9] {2} \\ - [0- 9] {2} | [0- 9] {1,2} \\. [ 0- 9] {1,2} \. [0- 9] {1,2}). * "," \\ 1 ", df)> DF [1]" 1.1.11 "" 1.11 12 "" 1.1.13 "" 1.1.14 "" 1.10.10 "" 1.10.11 "" 1.10.12 "" 2010-03-31 "" 2010-04 -01 "[10]" 2010-04-05 " 

Note that I have made it a character vector instead of data.frame.

The solution is to match the pattern or the other pattern, and then replace the rest with the subparents.


No comments:

Post a Comment