python – 用于查找总质量m的可能氨基酸序列的算法优化
|
参见英文答案 >
Finding Combinations to the provided Sum value2个
问题如下:我需要弄清楚有多少可能的氨基酸(aa)序列存在总质量m. 我最初的解决方案是创建aa的所有可能组合,并将每个组合的总质量与质量m进行比较.这适用于少量的m,但是当m开始为数百时,组合的数量变得非常高. 我做了一些小的优化,并使其工作得相当快,因为m< 500这对于这个问题已经足够好了,但我想知道如何让它适用于更高的质量. 这是我到目前为止: totalmass = m
def pepList():
tempList = ['']
temp2List = []
length = 0
total = 0
aminoList = 'GASPVTCINDKEMHFRYW' #this are all the aminoacids
while length < maxLength:
for i in tempList:
for j in aminoList:
pepMass = peptideMass(i+j,massTable) #find the mass of
#this peptide
if pepMass == totalmass:
total += 1
elif pepMass <= totalmass:
temp2List.append(i+j)
tempList = []
for i in temp2List:
tempList.append(i)
temp2List = []
length = length + 1
print (total)
pepList()
我可以在大约一秒钟内获得m = 300的解决方案,但m = 500需要大约40秒 我尝试使用itertools替代方案,但它没有更快: total = 0
pepList = []
for i in range(maxLength+1):
for p in itertools.combinations_with_replacement(aminoList,i):
#order matters for the total number of peptides but not for calculating
#the total mass
amino = ''.join(p)
if peptideMass(amino,massTable) == mass:
pepList.append(amino)
print (len(pepList))
newpepList = []
for i in pepList:
for p in itertools.permutations(i,r = len(i)):
#I use permutations here to get the total number because order matters
if p not in newpepList:
newpepList.append(p)
total +=1
print (total)
样本输入: 解决方法氨基酸发生的顺序不会改变质量 – AAC的重量与ACA和CAA相同.因此,这可以简化为线性规划问题 – 找到系数的值,使得M = a * A b * C c * D d * E e * G … r * W 一旦你有了解决方案,你就可以生成给定氨基酸组的所有可能的排列 – 或者如果你只需要排列的数量,你可以直接计算它. 编辑: 正如@Hooked指出的那样,这不是线性规划,原因有二:首先,我们需要整数系数,其次,我们正在寻找所有组合,而不是找到一个单一的最优解. 我已经制定了一个递归生成器,如下所示: from math import floor,ceil
import profile
amino_weight = {
'A': 71.038,'C': 103.009,'D': 115.027,'E': 129.043,'F': 147.068,'G': 57.021,'H': 137.059,'I': 113.084,'K': 128.095,'L': 113.084,# you omitted leutine?
'M': 131.040,'N': 114.043,'P': 97.053,'Q': 128.059,# you omitted glutamine?
'R': 156.101,'S': 87.032,'T': 101.048,'V': 99.068,'W': 186.079,'Y': 163.063
}
def get_float(prompt):
while True:
try:
return float(raw_input(prompt))
except ValueError:
pass
# This is where the fun happens!
def get_mass_combos(aminos,pos,lo,hi,cutoff):
this = aminos[pos] # use a pointer into the string,to avoid copying 8 million partial strings around
wt = amino_weight[this]
kmax = int(floor(hi / wt))
npos = pos - 1
if npos: # more aminos to consider recursively
for k in xrange(0,kmax + 1):
mass = k * wt
nlo = lo - mass
nhi = hi - mass
ncutoff = cutoff - mass
if nlo <= 0. and nhi >= 0.:
# we found a winner!
yield {this: k}
elif ncutoff < 0.:
# no further solution is possible
break
else:
# recurse
for cc in get_mass_combos(aminos,npos,nlo,nhi,ncutoff):
if k > 0: cc[this] = k
yield cc
else: # last amino - it's this or nothing
kmin = int(ceil(lo / wt))
for k in xrange(kmin,kmax+1):
yield {this: k}
def to_string(combo):
keys = sorted(combo)
return ''.join(k*combo[k] for k in keys)
def total_mass(combo):
return sum(amino_weight[a]*n for a,n in combo.items())
def fact(n):
num = 1
for i in xrange(2,n+1):
num *= i
return num
def permutations(combo):
num = 0
div = 1
for v in combo.values():
num += v
div *= fact(v)
return fact(num) / div
def find_combos(lo,hi):
total = 0
bases = []
aminos = ''.join(sorted(amino_weight,key = lambda x: amino_weight[x]))
for combo in get_mass_combos(aminos,len(aminos)-1,hi - amino_weight[aminos[0]]):
base = to_string(combo)
bases.append(base)
mass = total_mass(combo)
cc = permutations(combo)
total += cc
print("{} (mass {},{} permutations)".format(base,mass,cc))
print('Total: {} bases,{} permutations'.format(len(bases),total))
def main():
lo = get_float('Bottom of target mass range? ')
hi = get_float('Top of target mass range? ')
prof = profile.Profile()
prof.run('find_combos({},{})'.format(lo,hi))
prof.print_stats()
if __name__=="__main__":
main()
它还使用浮点氨基质量来寻找质量范围.在我的机器(i5-870)上搜索748.0和752.0之间的质量,返回7,505个碱基,总共9,400,528个排列,在3.82秒内. (编辑:安卓应用网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
