Thursday, 15 March 2012

regex - Cleanup file of phone numbers that are not properly formatted -



regex - Cleanup file of phone numbers that are not properly formatted -

i have file 10,000 phone numbers in , many not formatted properly, e.g. 123-456-7890 , although i've cleaned still have 1 pattern i'm not sure how handle. used sed clean of , don't mind using either sed or awk, although utilize sed more awk, 1 of lastly groups (2306 line) formatted properly

example: 123 4567890 (3 tab 7) needs 123-456-7890 (3 dash 3 dash 4).

i know can find pattern , replace tab plenty using:

sed "^[0-9][0-9][0-9]\t[0-9][0-9][0-9][0-9][0-9][0-9][0-9]/s/\t/-/" infile.txt > outfile.txt

however if augment instruction parse 7 numbers, grouped together, @ same time create easier me clean what's left after round. i've done fair amount of searching although couldn't found list when typed in subject work before next through posting question.

use extended regular expressions , capturing groups:

sed -e 's/^([0-9]{3})\t([0-9]{3})([0-9]{4})$/\1-\2-\3/' infile.txt > outfile.txt

regex osx bash awk sed

No comments:

Post a Comment