Thursday, 15 January 2015

python - Number of Specific Characters Per Every One Million -



python - Number of Specific Characters Per Every One Million -

first off, using python.

i attempting find number of specific characters (base pairs) per every 1000000 characters within chromosome.

for instance:

i have number of times a, g, t, , c, , a, g, t, , c appear within imported file.

i able (so far), count number of these characters entire file using "counter", not familiar how break per every 1 million?

thanks in advance!

if import file looks sequence of characters:

agtcagtcagtcagtcagtcagtcagtcagtc...

then apply approach:

file = 'c:\\test\\chromosome.txt' acount = [] gcount = [] tcount = [] ccount = [] acount = [] gcount = [] tcount = [] ccount = [] step = 1000000 start = 0 end = step open(file, 'r') chromosome: info = chromosome.read() while end < len(data): acount.append(data.count('a', start, end)) gcount.append(data.count('g', start, end)) tcount.append(data.count('t', start, end)) ccount.append(data.count('c', start, end)) acount.append(data.count('a', start, end)) gcount.append(data.count('g', start, end)) tcount.append(data.count('t', start, end)) ccount.append(data.count('c', start, end)) start = end end += step

at end 8 lists. each list containts counts of occurences of specific characters per million.

python split character

No comments:

Post a Comment