python - Number of Specific Characters Per Every One Million -
first off, using python.
i attempting find number of specific characters (base pairs) per every 1000000 characters within chromosome.
for instance:
i have number of times a, g, t, , c, , a, g, t, , c appear within imported file.
i able (so far), count number of these characters entire file using "counter", not familiar how break per every 1 million?
thanks in advance!
if import file looks sequence of characters:
agtcagtcagtcagtcagtcagtcagtcagtc...
then apply approach:
file = 'c:\\test\\chromosome.txt' acount = [] gcount = [] tcount = [] ccount = [] acount = [] gcount = [] tcount = [] ccount = [] step = 1000000 start = 0 end = step open(file, 'r') chromosome: info = chromosome.read() while end < len(data): acount.append(data.count('a', start, end)) gcount.append(data.count('g', start, end)) tcount.append(data.count('t', start, end)) ccount.append(data.count('c', start, end)) acount.append(data.count('a', start, end)) gcount.append(data.count('g', start, end)) tcount.append(data.count('t', start, end)) ccount.append(data.count('c', start, end)) start = end end += step
at end 8 lists. each list containts counts of occurences of specific characters per million.
python split character
No comments:
Post a Comment