Thursday, 15 April 2010

python - How to extract only numbers from input file. Numbers can be float/int -



python - How to extract only numbers from input file. Numbers can be float/int -

i want extract numbers(integers , float) file(exclude special symbols , alphabets). numbers positions.

import re file = open('input_file.txt', 'r') file = file.readlines() line in file: line=re.findall(r'\d+|\d+.\d+', line) print line

maybe help.

the string here can line. set in dummy text.

import re string = "he 100, 18.5 , 0.67. maybe should 100, 200, , 200b 200, 67.88" s = re.findall(r"[-+]?\d*\.\d+|\d+", string) print(s)

spits out next when executed:

['100', '18.5', '0.67', '100', '200', '200', '200', '67.88']

experiment

i performed little experiment on part corpus of frankenstein.

note utilize .read() read entire file instead of line line processing.

import re file = open('frank.txt', 'r') file = file.read() numbers = re.findall(r"[-+]?\d*\.\d+|\d+", file) print(numbers)

this result:

['17', '2008', '84', '1', '11', '17', '2', '28', '17', '3', '7', '17', '4', '5', '17', '31', '13', '17', '19', '17', '1', '2', '3', '4', '5', '6', '18', '17', '7', '7', '12', '17', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '27', '20', '21', '22', '18', '17', '23', '24', '26', '17', '2', '5', '7', '12', '9', '11', '84', '84', '8', '84', '1', '1', '1', '.8', '1', '1', '1', '1', '1', '1', '1', '.1', '1', '.2', '1', '.1', '1', '.7', '1', '.8', '1', '.9', '1', '.3', '1', '.1', '1', '.7', '1', '.4', '1', '.5', '1', '.1', '1', '.6', '1', '.1', '1', '.7', '1', '.8', '1', '.9', '1', '.8', '20', '60', '4', '30', '1', '.3', '90', '1', '.9', '3', '1', '1', '.1', '1', '.2', '1', '.3', '3', '1', '.3', '90', '1', '.4', '1', '.3', '1', '.5', '1', '.6', '2', '2001', '3', '4', '3', '501', '3', '64', '6221541', '501', '3', '4557', '99712', '809', '1500', '84116', '801', '596', '1887', '4', '1', '5', '000', '50', '5']

unit testing

i wrote lighter version works string supplied.

import unittest import re # extract numbers improved def extract_numbers_improved(x): numbers = re.findall(r"[-+]?\d*\.\d+|\d+", x) homecoming numbers # unit test class test(unittest.testcase): def testcase(self): teststr = "12asdasdsa 33asdsad 44 aidsasdd 2231%#@ qqq55 2222ww ww qq 1asdasd 33##$11 42.09 12$" self.assertequal(extract_numbers_improved(\ teststr), ['12', '33', '44', '2231', '55', '2222', '1', '33', '11', '42.09', '12']) unittest.main()

when things pass, gives greenish signal, shown below:

ran 1 test in 0.000s ok

python regex

No comments:

Post a Comment