Package Bio :: Package SeqUtils :: Module MeltingTemp
[hide private]
[frames] | no frames]

Source Code for Module Bio.SeqUtils.MeltingTemp

   1  # Copyright 2004-2008 by Sebastian Bassi. 
   2  # Copyright 2013 by Markus Piotrowski. 
   3  # All rights reserved. 
   4  # This code is part of the Biopython distribution and governed by its 
   5  # license.  Please see the LICENSE file that should have been included 
   6  # as part of this package. 
   7   
   8  """Calculate the melting temperature of nucleotide sequences. 
   9   
  10  This module contains three different methods to calculate the melting 
  11  temperature of oligonucleotides: 
  12   
  13  1. Tm_Wallace: 'Rule of thumb' 
  14  2. Tm_GC: Empirical formulas based on GC content. Salt and mismatch corrections 
  15     can be included. 
  16  3. Tm_NN: Calculation based on nearest neighbor thermodynamics. Several tables 
  17     for DNA/DNA, DNA/RNA and RNA/RNA hybridizations are included. 
  18     Correction for mismatches, dangling ends, salt concentration and other 
  19     additives are available. 
  20   
  21  Tm_staluc is the 'old' NN calculation and is kept for compatibility. It is, 
  22  however, recommended to use Tm_NN instead, since Tm_staluc may be depreceated 
  23  in the future. Also, Tm_NN has much more options. Using Tm_staluc and Tm_NN 
  24  with default parameters gives (essentially) the same results. 
  25   
  26  General parameters for most Tm methods: 
  27   - seq -- A Biopython sequence object or a string. 
  28   - check -- Checks if the sequence is valid for the given method (default= 
  29     True). In general, whitespaces and non-base characters are removed and 
  30     characters are converted to uppercase. RNA will be backtranscribed. 
  31   - strict -- Do not allow base characters or neighbor duplex keys (e.g. 
  32     'AT/NA') that could not or not unambigiously be evaluated for the respective 
  33     method (default=True). Note that W (= A or T) and S (= C or G) are not 
  34     ambiguous for Tm_Wallace and Tm_GC. If 'False', average values (if 
  35     applicable) will be used. 
  36   
  37  This module is not able to detect self-complementary and it will not use 
  38  alignment tools to align an oligonucleotide sequence to its target sequence. 
  39  Thus it can not detect dangling-ends and mismatches by itself (don't even think 
  40  about bulbs and loops). These parameters have to be handed over to the 
  41  respective method. 
  42   
  43  Other public methods of this module: 
  44   - make_table     : To create a table with thermodynamic data. 
  45   - salt_correction: To adjust Tm to a given salt concentration by different 
  46     formulas. This method is called from Tm_GC and Tm_NN but may 
  47     also be accessed 'manually'. It returns a correction term, not 
  48     a corrected Tm! 
  49   - chem_correction: To adjust Tm regarding the chemical additives DMSO and 
  50     formaldehyde. The method returns a corrected Tm. Chemical 
  51     correction is not an integral part of the Tm methods and must 
  52     be called additionally. 
  53   
  54  Examples: 
  55      >>> from Bio.SeqUtils import MeltingTemp as mt 
  56      >>> from Bio.Seq import Seq 
  57      >>> mystring = 'CGTTCCAAAGATGTGGGCATGAGCTTAC' 
  58      >>> myseq = Seq(mystring) 
  59      >>> print('%0.2f' % mt.Tm_Wallace(mystring)) 
  60      84.00 
  61      >>> print('%0.2f' % mt.Tm_Wallace(myseq)) 
  62      84.00 
  63      >>> print('%0.2f' % mt.Tm_GC(myseq)) 
  64      58.73 
  65      >>> print('%0.2f' % mt.Tm_NN(myseq)) 
  66      60.32 
  67   
  68  Tm_NN with default values gives same result as 'old' Tm_staluc. However, values 
  69  differ for RNA, since Tm_staluc had some errors for RNA calculation. These 
  70  errors have been fixed in this version. 
  71   
  72  New Tm_NN can do slightly more: 
  73  Using different thermodynamic tables, e.g. from Breslauer '86 or Sugimoto '96: 
  74   
  75      >>> print('%0.2f' % mt.Tm_NN(myseq, nn_table=mt.DNA_NN1))  # Breslauer '86 
  76      72.19 
  77      >>> print('%0.2f' % mt.Tm_NN(myseq, nn_table=mt.DNA_NN2))  # Sugimoto '96 
  78      65.47 
  79   
  80  Several types of salc correction (for Tm_NN and Tm_GC): 
  81   
  82      >>> for i in range(1, 8): 
  83      ...     print("Type: %d, Tm: %0.2f" % (i, Tm_NN(myseq, saltcorr=i))) 
  84      ... 
  85      Type: 1, Tm: 54.27 
  86      Type: 2, Tm: 54.02 
  87      Type: 3, Tm: 59.60 
  88      Type: 4, Tm: 60.64 
  89      Type: 5, Tm: 60.32 
  90      Type: 6, Tm: 59.78 
  91      Type: 7, Tm: 59.78 
  92   
  93  Correction for other monovalent cations (K+, Tris), Mg2+ and dNTPs according 
  94  to von Ahsen et al. (2001) or Owczarzy et al. (2008) (for Tm_NN and Tm_GC): 
  95   
  96      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10)) 
  97      60.79 
  98      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5)) 
  99      67.39 
 100      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5, saltcorr=7)) 
 101      66.81 
 102      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5, dNTPs=0.6, 
 103      ...                          saltcorr=7)) 
 104      66.04 
 105   
 106  Dangling ends and mismatches, e.g.:: 
 107   
 108      Oligo:     CGTTCCaAAGATGTGGGCATGAGCTTAC       CGTTCCaAAGATGTGGGCATGAGCTTAC 
 109                 ::::::X:::::::::::::::::::::  or   ::::::X::::::::::::::::::::: 
 110      Template:  GCAAGGcTTCTACACCCGTACTCGAATG      TGCAAGGcTTCTACACCCGTACTCGAATGC 
 111   
 112  Here: 
 113   
 114      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC')) 
 115      60.32 
 116      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC', 
 117      ...                    c_seq='GCAAGGcTTCTACACCCGTACTCGAATG')) 
 118      55.39 
 119      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC', shift=1, 
 120      ...                   c_seq='TGCAAGGcTTCTACACCCGTACTCGAATGC')) 
 121      55.69 
 122   
 123  The same for RNA: 
 124   
 125      >>> print('%0.2f' % mt.Tm_NN('CGUUCCAAAGAUGUGGGCAUGAGCUUAC', 
 126      ...                   c_seq='UGCAAGGcUUCUACACCCGUACUCGAAUGC', 
 127      ...                   shift=1, nn_table=mt.RNA_NN3, 
 128      ...                   de_table=mt.RNA_DE1)) 
 129      73.00 
 130   
 131  Note, that thermodynamic data are not available for all kind of mismatches, 
 132  e.g. most double mismatches or terminal mismaches combined with danglind ends: 
 133   
 134      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC', 
 135      ...                   c_seq='TtCAAGGcTTCTACACCCGTACTCGAATGC', 
 136      ...                   shift=1)) 
 137      Traceback (most recent call last): 
 138      ValueError: no thermodynamic data for neighbors '.C/TT' available 
 139   
 140  Make your own tables, or update/extend existing tables. E.g., add values for 
 141  locked nucleotides. Here, 'locked A' (and its complement) should be represented 
 142  by '1': 
 143   
 144      >>> mytable = mt.make_table(oldtable=mt.DNA_NN3, 
 145      ...                         values={'A1/T1':(-6.608, -17.235), 
 146      ...                         '1A/1T':(-6.893, -15.923)}) 
 147      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC')) 
 148      60.32 
 149      >>> print('%0.2f' % mt.Tm_NN('CGTTCCA1AGATGTGGGCATGAGCTTAC', 
 150      ...                           nn_table=mytable, check=False)) 
 151      ... # 'check' must be False, otherwise '1' would be discarded 
 152      62.53 
 153   
 154  """ 
 155   
 156  from __future__ import print_function 
 157   
 158  import math 
 159  import warnings 
 160   
 161  from Bio import SeqUtils, Seq 
 162  from Bio import BiopythonWarning 
 163   
 164   
 165  # Thermodynamic lookup tables (dictionaries): 
 166  # Enthalpy (dH) and entropy (dS) values for nearest neighbors and initiation 
 167  # process. Calculation of duplex initiation is quite different in several 
 168  # papers; to allow for a general calculation, all different initiation 
 169  # parameters are included in all tables and non-applicable parameters are set 
 170  # to zero. 
 171  # The key is either an initiation type (e.g., 'init_A/T') or a nearest neighbor 
 172  # duplex sequence (e.g., GT/CA, to read 5'GT3'-3'CA5'). The values are tuples 
 173  # of dH (kcal/mol), dS (cal/mol K). 
 174   
 175  # DNA/DNA 
 176  # Breslauer et al. (1986), Proc Natl Acad Sci USA 83: 3746-3750 
 177  DNA_NN1 = { 
 178      'init': (0, 0), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 179      'init_oneG/C': (0, -16.8), 'init_allA/T': (0, -20.1), 'init_5T/A': (0, 0), 
 180      'sym': (0, -1.3), 
 181      'AA/TT': (-9.1, -24.0), 'AT/TA': (-8.6, -23.9), 'TA/AT': (-6.0, -16.9), 
 182      'CA/GT': (-5.8, -12.9), 'GT/CA': (-6.5, -17.3), 'CT/GA': (-7.8, -20.8), 
 183      'GA/CT': (-5.6, -13.5), 'CG/GC': (-11.9, -27.8), 'GC/CG': (-11.1, -26.7), 
 184      'GG/CC': (-11.0, -26.6)} 
 185   
 186  # Sugimoto et al. (1996), Nuc Acids Res 24 : 4501-4505 
 187  DNA_NN2 = { 
 188      'init': (0.6, -9.0), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 189      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 190      'sym': (0, -1.4), 
 191      'AA/TT': (-8.0, -21.9), 'AT/TA': (-5.6, -15.2), 'TA/AT': (-6.6, -18.4), 
 192      'CA/GT': (-8.2, -21.0), 'GT/CA': (-9.4, -25.5), 'CT/GA': (-6.6, -16.4), 
 193      'GA/CT': (-8.8, -23.5), 'CG/GC': (-11.8, -29.0), 'GC/CG': (-10.5, -26.4), 
 194      'GG/CC': (-10.9, -28.4)} 
 195   
 196  # Allawi and SantaLucia (1997), Biochemistry 36: 10581-10594 
 197  DNA_NN3 = { 
 198      'init': (0, 0), 'init_A/T': (2.3, 4.1), 'init_G/C': (0.1, -2.8), 
 199      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 200      'sym': (0, -1.4), 
 201      'AA/TT': (-7.9, -22.2), 'AT/TA': (-7.2, -20.4), 'TA/AT': (-7.2, -21.3), 
 202      'CA/GT': (-8.5, -22.7), 'GT/CA': (-8.4, -22.4), 'CT/GA': (-7.8, -21.0), 
 203      'GA/CT': (-8.2, -22.2), 'CG/GC': (-10.6, -27.2), 'GC/CG': (-9.8, -24.4), 
 204      'GG/CC': (-8.0, -19.9)} 
 205   
 206  # SantaLucia & Hicks (2004), Annu. Rev. Biophys. Biomol. Struct 33: 415-440 
 207  DNA_NN4 = { 
 208      'init': (0.2, -5.7), 'init_A/T': (2.2, 6.9), 'init_G/C': (0, 0), 
 209      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 210      'sym': (0, -1.4), 
 211      'AA/TT': (-7.6, -21.3), 'AT/TA': (-7.2, -20.4), 'TA/AT': (-7.2, -20.4), 
 212      'CA/GT': (-8.5, -22.7), 'GT/CA': (-8.4, -22.4), 'CT/GA': (-7.8, -21.0), 
 213      'GA/CT': (-8.2, -22.2), 'CG/GC': (-10.6, -27.2), 'GC/CG': (-9.8, -24.4), 
 214      'GG/CC': (-8.0, -19.0)} 
 215   
 216  # RNA/RNA 
 217  # Freier et al. (1986), Proc Natl Acad Sci USA 83: 9373-9377 
 218  RNA_NN1 = { 
 219      'init': (0, -10.8), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 220      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 221      'sym': (0, -1.4), 
 222      'AA/TT': (-6.6, -18.4), 'AT/TA': (-5.7, -15.5), 'TA/AT': (-8.1, -22.6), 
 223      'CA/GT': (-10.5, -27.8), 'GT/CA': (-10.2, -26.2), 'CT/GA': (-7.6, -19.2), 
 224      'GA/CT': (-13.3, -35.5), 'CG/GC': (-8.0, -19.4), 'GC/CG': (-14.2, -34.9), 
 225      'GG/CC': (-12.2, -29.7)} 
 226   
 227  # Xia et al (1998), Biochemistry 37: 14719-14735 
 228  RNA_NN2 = { 
 229      'init': (3.61, -1.5), 'init_A/T': (3.72, 10.5), 'init_G/C': (0, 0), 
 230      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 231      'sym': (0, -1.4), 
 232      'AA/TT': (-6.82, -19.0), 'AT/TA': (-9.38, -26.7), 'TA/AT': (-7.69, -20.5), 
 233      'CA/GT': (-10.44, -26.9), 'GT/CA': (-11.40, -29.5), 
 234      'CT/GA': (-10.48, -27.1), 'GA/CT': (-12.44, -32.5), 
 235      'CG/GC': (-10.64, -26.7), 'GC/CG': (-14.88, -36.9), 
 236      'GG/CC': (-13.39, -32.7)} 
 237   
 238  # Chen et al. (2012), Biochemistry 51: 3508-3522 
 239  RNA_NN3 = { 
 240      'init': (6.40, 6.99), 'init_A/T': (3.85, 11.04), 'init_G/C': (0, 0), 
 241      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 242      'sym': (0, -1.4), 
 243      'AA/TT': (-7.09, -19.8), 'AT/TA': (-9.11, -25.8), 'TA/AT': (-8.50, -22.9), 
 244      'CA/GT': (-11.03, -28.8), 'GT/CA': (-11.98, -31.3), 
 245      'CT/GA': (-10.90, -28.5), 'GA/CT': (-13.21, -34.9), 
 246      'CG/GC': (-10.88, -27.4), 'GC/CG': (-16.04, -40.6), 
 247      'GG/CC': (-14.18, -35.0), 'GT/TG': (-13.83, -46.9), 
 248      'GG/TT': (-17.82, -56.7), 'AG/TT': (-3.96, -11.6), 
 249      'TG/AT': (-0.96, -1.8), 'TT/AG': (-10.38, -31.8), 'TG/GT': (-12.64, -38.9), 
 250      'AT/TG': (-7.39, -21.0), 'CG/GT': (-5.56, -13.9), 'CT/GG': (-9.44, -24.7), 
 251      'GG/CT': (-7.03, -16.8), 'GT/CG': (-11.09, -28.8)} 
 252   
 253  # RNA/DNA 
 254  # Sugimoto et al. (1995), Biochemistry 34: 11211-11216 
 255  R_DNA_NN1 = { 
 256      'init': (1.9, -3.9), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 257      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 258      'sym': (0, 0), 
 259      'AA': (-11.5, -36.4), 'AC': (-7.8, -21.6), 'AG': (-7.0, -19.7), 
 260      'AT': (-8.3, -23.9), 'CA': (-10.4, -28.4), 'CC': (-12.8, -31.9), 
 261      'CG': (-16.3, -47.1), 'CT': (-9.1, -23.5), 'GA': (-8.6, -22.9), 
 262      'GC': (-8.0, -17.1), 'GG': (-9.3, -23.2), 'GT': (-5.9, -12.3), 
 263      'TA': (-7.8, -23.2), 'TC': (-5.5, -13.5), 'TG': (-9.0, -26.1), 
 264      'TT': (-7.8, -21.9)} 
 265   
 266  # Internal mismatch and inosine table (DNA) 
 267  # Allawi & SantaLucia (1997), Biochemistry 36: 10581-10594 
 268  # Allawi & SantaLucia (1998), Biochemistry 37: 9435-9444 
 269  # Allawi & SantaLucia (1998), Biochemistry 37: 2170-2179 
 270  # Allawi & SantaLucia (1998), Nucl Acids Res 26: 2694-2701 
 271  # Peyret et al. (1999), Biochemistry 38: 3468-3477 
 272  # Watkins & SantaLucia (2005), Nucl Acids Res 33: 6258-6267 
 273  DNA_IMM1 = { 
 274      'AG/TT': (1.0, 0.9), 'AT/TG': (-2.5, -8.3), 'CG/GT': (-4.1, -11.7), 
 275      'CT/GG': (-2.8, -8.0), 'GG/CT': (3.3, 10.4), 'GG/TT': (5.8, 16.3), 
 276      'GT/CG': (-4.4, -12.3), 'GT/TG': (4.1, 9.5), 'TG/AT': (-0.1, -1.7), 
 277      'TG/GT': (-1.4, -6.2), 'TT/AG': (-1.3, -5.3), 'AA/TG': (-0.6, -2.3), 
 278      'AG/TA': (-0.7, -2.3), 'CA/GG': (-0.7, -2.3), 'CG/GA': (-4.0, -13.2), 
 279      'GA/CG': (-0.6, -1.0), 'GG/CA': (0.5, 3.2), 'TA/AG': (0.7, 0.7), 
 280      'TG/AA': (3.0, 7.4), 
 281      'AC/TT': (0.7, 0.2), 'AT/TC': (-1.2, -6.2), 'CC/GT': (-0.8, -4.5), 
 282      'CT/GC': (-1.5, -6.1), 'GC/CT': (2.3, 5.4), 'GT/CC': (5.2, 13.5), 
 283      'TC/AT': (1.2, 0.7), 'TT/AC': (1.0, 0.7), 
 284      'AA/TC': (2.3, 4.6), 'AC/TA': (5.3, 14.6), 'CA/GC': (1.9, 3.7), 
 285      'CC/GA': (0.6, -0.6), 'GA/CC': (5.2, 14.2), 'GC/CA': (-0.7, -3.8), 
 286      'TA/AC': (3.4, 8.0), 'TC/AA': (7.6, 20.2), 
 287      'AA/TA': (1.2, 1.7), 'CA/GA': (-0.9, -4.2), 'GA/CA': (-2.9, -9.8), 
 288      'TA/AA': (4.7, 12.9), 'AC/TC': (0.0, -4.4), 'CC/GC': (-1.5, -7.2), 
 289      'GC/CC': (3.6, 8.9), 'TC/AC': (6.1, 16.4), 'AG/TG': (-3.1, -9.5), 
 290      'CG/GG': (-4.9, -15.3), 'GG/CG': (-6.0, -15.8), 'TG/AG': (1.6, 3.6), 
 291      'AT/TT': (-2.7, -10.8), 'CT/GT': (-5.0, -15.8), 'GT/CT': (-2.2, -8.4), 
 292      'TT/AT': (0.2, -1.5), 
 293      'AI/TC': (-8.9, -25.5), 'TI/AC': (-5.9, -17.4), 'AC/TI': (-8.8, -25.4), 
 294      'TC/AI': (-4.9, -13.9), 'CI/GC': (-5.4, -13.7), 'GI/CC': (-6.8, -19.1), 
 295      'CC/GI': (-8.3, -23.8), 'GC/CI': (-5.0, -12.6), 
 296      'AI/TA': (-8.3, -25.0), 'TI/AA': (-3.4, -11.2), 'AA/TI': (-0.7, -2.6), 
 297      'TA/AI': (-1.3, -4.6), 'CI/GA': (2.6, 8.9), 'GI/CA': (-7.8, -21.1), 
 298      'CA/GI': (-7.0, -20.0), 'GA/CI': (-7.6, -20.2), 
 299      'AI/TT': (0.49, -0.7), 'TI/AT': (-6.5, -22.0), 'AT/TI': (-5.6, -18.7), 
 300      'TT/AI': (-0.8, -4.3), 'CI/GT': (-1.0, -2.4), 'GI/CT': (-3.5, -10.6), 
 301      'CT/GI': (0.1, -1.0), 'GT/CI': (-4.3, -12.1), 
 302      'AI/TG': (-4.9, -15.8), 'TI/AG': (-1.9, -8.5), 'AG/TI': (0.1, -1.8), 
 303      'TG/AI': (1.0, 1.0), 'CI/GG': (7.1, 21.3), 'GI/CG': (-1.1, -3.2), 
 304      'CG/GI': (5.8, 16.9), 'GG/CI': (-7.6, -22.0), 
 305      'AI/TI': (-3.3, -11.9), 'TI/AI': (0.1, -2.3), 'CI/GI': (1.3, 3.0), 
 306      'GI/CI': (-0.5, -1.3)} 
 307   
 308  # Terminal mismatch table (DNA) 
 309  # SantaLucia & Peyret (2001) Patent Application WO 01/94611 
 310  DNA_TMM1 = { 
 311      'AA/TA': (-3.1, -7.8), 'TA/AA': (-2.5, -6.3), 'CA/GA': (-4.3, -10.7), 
 312      'GA/CA': (-8.0, -22.5), 
 313      'AC/TC': (-0.1, 0.5), 'TC/AC': (-0.7, -1.3), ' CC/GC': (-2.1, -5.1), 
 314      'GC/CC': (-3.9, -10.6), 
 315      'AG/TG': (-1.1, -2.1), 'TG/AG': (-1.1, -2.7), 'CG/GG': (-3.8, -9.5), 
 316      'GG/CG': (-0.7, -19.2), 
 317      'AT/TT': (-2.4, -6.5), 'TT/AT': (-3.2, -8.9), 'CT/GT': (-6.1, -16.9), 
 318      'GT/CT': (-7.4, -21.2), 
 319      'AA/TC': (-1.6, -4.0), 'AC/TA': (-1.8, -3.8), 'CA/GC': (-2.6, -5.9), 
 320      'CC/GA': (-2.7, -6.0), 'GA/CC': (-5.0, -13.8), 'GC/CA': (-3.2, -7.1), 
 321      'TA/AC': (-2.3, -5.9), 'TC/AA': (-2.7, -7.0), 
 322      'AC/TT': (-0.9, -1.7), 'AT/TC': (-2.3, -6.3), 'CC/GT': (-3.2, -8.0), 
 323      'CT/GC': (-3.9, -10.6), 'GC/CT': (-4.9, -13.5), 'GT/CC': (-3.0, -7.8), 
 324      'TC/AT': (-2.5, -6.3), 'TT/AC': (-0.7, -1.2), 
 325      'AA/TG': (-1.9, -4.4), 'AG/TA': (-2.5, -5.9), 'CA/GG': (-3.9, -9.6), 
 326      'CG/GA': (-6.0, -15.5), 'GA/CG': (-4.3, -11.1), ' GG/CA': (-4.6, -11.4), 
 327      'TA/AG': (-2.0, -4.7), 'TG/AA': (-2.4, -5.8), 
 328      'AG/TT': (-3.2, -8.7), 'AT/TG': (-3.5, -9.4), 'CG/GT': (-3.8, -9.0), 
 329      'CT/GG': (-6.6, -18.7), 'GG/CT': (-5.7, -15.9), 'GT/CG': (-5.9, -16.1), 
 330      'TG/AT': (-3.9, -10.5), 'TT/AG': (-3.6, -9.8)} 
 331   
 332  # Dangling ends table (DNA) 
 333  # Bommarito et al. (2000), Nucl Acids Res 28: 1929-1934 
 334  DNA_DE1 = { 
 335      'AA/.T': (0.2, 2.3), 'AC/.G': (-6.3, -17.1), 'AG/.C': (-3.7, -10.0), 
 336      'AT/.A': (-2.9, -7.6), 'CA/.T': (0.6, 3.3), 'CC/.G': (-4.4, -12.6), 
 337      'CG/.C': (-4.0, -11.9), 'CT/.A': (-4.1, -13.0), 'GA/.T': (-1.1, -1.6), 
 338      'GC/.G': (-5.1, -14.0), 'GG/.C': (-3.9, -10.9), 'GT/.A': (-4.2, -15.0), 
 339      'TA/.T': (-6.9, -20.0), 'TC/.G': (-4.0, -10.9), 'TG/.C': (-4.9, -13.8), 
 340      'TT/.A': (-0.2, -0.5), 
 341      '.A/AT': (-0.7, -0.8), '.C/AG': (-2.1, -3.9), '.G/AC': (-5.9, -16.5), 
 342      '.T/AA': (-0.5, -1.1), '.A/CT': (4.4, 14.9), '.C/CG': (-0.2, -0.1), 
 343      '.G/CC': (-2.6, -7.4), '.T/CA': (4.7, 14.2), '.A/GT': (-1.6, -3.6), 
 344      '.C/GG': (-3.9, -11.2), '.G/GC': (-3.2, -10.4), '.T/GA': (-4.1, -13.1), 
 345      '.A/TT': (2.9, 10.4), '.C/TG': (-4.4, -13.1), '.G/TC': (-5.2, -15.0), 
 346      '.T/TA': (-3.8, -12.6)} 
 347   
 348  # Dangling ends table (RNA) 
 349  # Turner & Mathews (2010), Nucl Acids Res 38: D280-D282 
 350  RNA_DE1 = { 
 351      '.T/AA': (-4.9, -13.2), '.T/CA': (-0.9, -1.3), '.T/GA': (-5.5, -15.1), 
 352      '.T/TA': (-2.3, -5.5), 
 353      '.G/AC': (-9.0, -23.5), '.G/CC': (-4.1, -10.6), '.G/GC': (-8.6, -22.2), 
 354      '.G/TC': (-7.5, -20.31), 
 355      '.C/AG': (-7.4, -20.3), '.C/CG': (-2.8, -7.7), '.C/GG': (-6.4, -16.4), 
 356      '.C/TG': (-3.6, -9.7), 
 357      '.T/AG': (-4.9, -13.2), '.T/CG': (-0.9, -1.3), '.T/GG': (-5.5, -15.1), 
 358      '.T/TG': (-2.3, -5.5), 
 359      '.A/AT': (-5.7, -16.1), '.A/CT': (-0.7, -1.9), '.A/GT': (-5.8, -16.4), 
 360      '.A/TT': (-2.2, -6.8), 
 361      '.G/AT': (-5.7, -16.1), '.G/CT': (-0.7, -1.9), '.G/GT': (-5.8, -16.4), 
 362      '.G/TT': (-2.2, -6.8), 
 363      'AT/.A': (-0.5, -0.6), 'CT/.A': (6.9, 22.6), 'GT/.A': (0.6, 2.6), 
 364      'TT/.A': (0.6, 2.6), 
 365      'AG/.C': (-1.6, -4.5), 'CG/.C': (0.7, 3.2), 'GG/.C': (-4.6, -14.8), 
 366      'TG/.C': (-0.4, -1.3), 
 367      'AC/.G': (-2.4, -6.1), 'CC/.G': (3.3, 11.6), 'GC/.G': (0.8, 3.2), 
 368      'TC/.G': (-1.4, -4.2), 
 369      'AT/.G': (-0.5, -0.6), 'CT/.G': (6.9, 22.6), 'GT/.G': (0.6, 2.6), 
 370      'TT/.G': (0.6, 2.6), 
 371      'AA/.T': (1.6, 6.1), 'CA/.T': (2.2, 8.1), 'GA/.T': (0.7, 3.5), 
 372      'TA/.T': (3.1, 10.6), 
 373      'AG/.T': (1.6, 6.1), 'CG/.T': (2.2, 8.1), 'GG/.T': (0.7, 3.5), 
 374      'TG/.T': (3.1, 10.6)} 
 375   
 376   
377 -def make_table(oldtable=None, values=None):
378 """Return a table with thermodynamic parameters (as dictionary). 379 380 Parameters: 381 oldtable: An existing dictionary with thermodynamic parameters. 382 values: A dictionary with new or updated values. 383 384 E.g., to replace the initiation parameters in the Sugimoto '96 dataset with 385 the initiation parameters from Allawi & SantaLucia '97: 386 387 >>> from Bio.SeqUtils.MeltingTemp import make_table, DNA_NN2 388 >>> table = DNA_NN2 # Sugimoto '96 389 >>> table['init_A/T'] 390 (0, 0) 391 >>> newtable = make_table(oldtable=DNA_NN2, values={'init': (0, 0), 392 ... 'init_A/T': (2.3, 4.1), 393 ... 'init_G/C': (0.1, -2.8)}) 394 >>> print("%0.1f, %0.1f" % newtable['init_A/T']) 395 2.3, 4.1 396 397 """ 398 if oldtable is None: 399 table = {'init': (0, 0), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 400 'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 401 'init_5T/A': (0, 0), 'sym': (0, 0), 'AA/TT': (0, 0), 402 'AT/TA': (0, 0), 'TA/AT': (0, 0), 'CA/GT': (0, 0), 403 'GT/CA': (0, 0), 'CT/GA': (0, 0), 'GA/CT': (0, 0), 404 'CG/GC': (0, 0), 'GC/CG': (0, 0), 'GG/CC': (0, 0)} 405 else: 406 table = oldtable.copy() 407 if values: 408 table.update(values) 409 return table
410 411
412 -def _check(seq, method):
413 """Return a sequence which fullfils the requirements of the given method. 414 415 All Tm methods in this package require the sequence in uppercase format. 416 Most methods make use of the length of the sequence (directly or 417 indirectly), which can only be expressed as len(seq) if the sequence does 418 not contain whitespaces and other non-base characters. RNA sequences are 419 backtranscribed to DNA. This method is PRIVATE. 420 421 Arguments: 422 seq: The sequence as given by the user (passed as string). 423 method: Tm_Wallace, Tm_GC or Tm_NN. 424 425 >>> from Bio.SeqUtils import MeltingTemp as mt 426 >>> mt._check('10 ACGTTGCAAG tccatggtac', 'Tm_NN') 427 'ACGTTGCAAGTCCATGGTAC' 428 429 """ 430 seq = ''.join(seq.split()).upper() 431 seq = str(Seq.Seq(seq).back_transcribe()) 432 if method == 'Tm_Wallace': 433 return seq 434 if method == 'Tm_GC': 435 baseset = ('A', 'B', 'C', 'D', 'G', 'H', 'I', 'K', 'M', 'N', 'R', 'S', 436 'T', 'V', 'W', 'X', 'Y') 437 if method == 'Tm_NN': 438 baseset = ('A', 'C', 'G', 'T', 'I') 439 seq = ''.join([base for base in seq if base in baseset]) 440 return seq
441 442
443 -def salt_correction(Na=0, K=0, Tris=0, Mg=0, dNTPs=0, method=1, seq=None):
444 """Calculate a term to correct Tm for salt ions. 445 446 Depending on the Tm calculation, the term will correct Tm or entropy. To 447 calculate corrected Tm values, different operations need to be applied: 448 449 - methods 1-4: Tm(new) = Tm(old) + corr 450 - method 5: deltaH(new) = deltaH(old) + corr 451 - methods 6+7: Tm(new) = 1/(1/Tm(old) + corr) 452 453 Parameters: 454 - Na, K, Tris, Mg, dNTPS: Millimolar concentration of respective ion. To 455 have a simple 'salt correction', just pass Na. If any of K, Tris, Mg and 456 dNTPS is non-zero, a 'sodium-equivalent' concentration is calculated 457 according to von Ahsen et al. (2001, Clin Chem 47: 1956-1961): 458 [Na_eq] = [Na+] + [K+] + [Tris]/2 + 120*([Mg2+] - [dNTPs])^0.5 459 If [dNTPs] >= [Mg2+]: [Na_eq] = [Na+] + [K+] + [Tris]/2 460 - method: Which method to be applied. Methods 1-4 correct Tm, method 5 461 corrects deltaS, methods 6 and 7 correct 1/Tm. The methods are: 462 463 1. 16.6 x log[Na+] 464 (Schildkraut & Lifson (1965), Biopolymers 3: 195-208) 465 2. 16.6 x log([Na+]/(1.0 + 0.7*[Na+])) 466 (Wetmur (1991), Crit Rev Biochem Mol Biol 126: 227-259) 467 3. 12.5 x log(Na+] 468 (SantaLucia et al. (1996), Biochemistry 35: 3555-3562 469 4. 11.7 x log[Na+] 470 (SantaLucia (1998), Proc Natl Acad Sci USA 95: 1460-1465 471 5. Correction for deltaS: 0.368 x (N-1) x ln[Na+] 472 (SantaLucia (1998), Proc Natl Acad Sci USA 95: 1460-1465) 473 6. (4.29(%GC)-3.95)x1e-5 x ln[Na+] + 9.40e-6 x ln[Na+]^2 474 (Owczarzy et al. (2004), Biochemistry 43: 3537-3554) 475 7. Complex formula with decision tree and 7 empirical constants. 476 Mg2+ is corrected for dNTPs binding (if present) 477 (Owczarzy et al. (2008), Biochemistry 47: 5336-5353) 478 479 Examples: 480 >>> from Bio.SeqUtils import MeltingTemp as mt 481 >>> print('%0.2f' % mt.salt_correction(Na=50, method=1)) 482 -21.60 483 >>> print('%0.2f' % mt.salt_correction(Na=50, method=2)) 484 -21.85 485 >>> print('%0.2f' % mt.salt_correction(Na=100, Tris=20, method=2)) 486 -16.45 487 >>> print('%0.2f' % mt.salt_correction(Na=100, Tris=20, Mg=1.5, 488 ... method=2)) 489 -10.99 490 491 """ 492 if method in (5, 6, 7) and not seq: 493 raise ValueError('sequence is missing (is needed to calculate ' + 494 'GC content or sequence length).') 495 if seq: 496 seq = str(seq) 497 corr = 0 498 if not method: 499 return corr 500 Mon = Na + K + Tris / 2.0 # Note: all these values are millimolar 501 mg = Mg * 1e-3 # Lowercase ions (mg, mon, dntps) are molar 502 # Na equivalent according to von Ahsen et al. (2001): 503 if sum((K, Mg, Tris, dNTPs)) > 0 and not method == 7 and dNTPs < Mg: 504 # dNTPs bind Mg2+ strongly. If [dNTPs] is larger or equal than 505 # [Mg2+], free Mg2+ is considered not to be relevant. 506 Mon += 120 * math.sqrt(Mg - dNTPs) 507 mon = Mon * 1e-3 508 # Note: math.log = ln(), math.log10 = log() 509 if method in range(1, 7) and not mon: 510 raise ValueError('Total ion concentration of zero is not allowed in ' + 511 'this method.') 512 if method == 1: 513 corr = 16.6 * math.log10(mon) 514 if method == 2: 515 corr = 16.6 * math.log10((mon) / (1.0 + 0.7 * (mon))) 516 if method == 3: 517 corr = 12.5 * math.log10(mon) 518 if method == 4: 519 corr = 11.7 * math.log10(mon) 520 if method == 5: 521 corr = 0.368 * (len(seq) - 1) * math.log(mon) 522 if method == 6: 523 corr = (4.29 * SeqUtils.GC(seq) / 100 - 3.95) * 1e-5 * math.log(mon) +\ 524 9.40e-6 * math.log(mon) ** 2 525 if method == 7: 526 a, b, c, d = 3.92, -0.911, 6.26, 1.42 527 e, f, g = -48.2, 52.5, 8.31 528 if dNTPs > 0: 529 dntps = dNTPs * 1e-3 530 ka = 3e4 # Dissociation constant for Mg:dNTP 531 # Free Mg2+ calculation: 532 mg = (-(ka * dntps - ka * mg + 1.0) + 533 math.sqrt((ka * dntps - ka * mg + 1.0) ** 2 + 534 4.0 * ka * mg)) / (2.0 * ka) 535 if Mon > 0: 536 R = math.sqrt(mg) / mon 537 if R < 0.22: 538 corr = (4.29 * SeqUtils.GC(seq) / 100 - 3.95) * \ 539 1e-5 * math.log(mon) + 9.40e-6 * math.log(mon) ** 2 540 return corr 541 elif R < 6.0: 542 a = 3.92 * (0.843 - 0.352 * math.sqrt(mon) * math.log(mon)) 543 d = 1.42 * (1.279 - 4.03e-3 * math.log(mon) - 544 8.03e-3 * math.log(mon) ** 2) 545 g = 8.31 * (0.486 - 0.258 * math.log(mon) + 546 5.25e-3 * math.log(mon) ** 3) 547 corr = (a + b * math.log(mg) + (SeqUtils.GC(seq) / 100) * 548 (c + d * math.log(mg)) + (1 / (2.0 * (len(seq) - 1))) * 549 (e + f * math.log(mg) + g * math.log(mg) ** 2)) * 1e-5 550 if method > 7: 551 raise ValueError('Allowed values for parameter \'method\' are 1-7.') 552 return corr
553 554
555 -def chem_correction(melting_temp, DMSO=0, fmd=0, DMSOfactor=0.75, 556 fmdfactor=0.65, fmdmethod=1, GC=None):
557 """Correct a given Tm for DMSO and formamide. 558 559 Please note that these corrections are +/- rough approximations. 560 561 Arguments: 562 - melting_temp: Melting temperature. 563 - DMSO: Percent DMSO. 564 - fmd: Formamide concentration in %(fmdmethod=1) or molar (fmdmethod=2). 565 - DMSOfactor: How much should Tm decreases per percent DMSO. Default=0.65 566 (von Ahsen et al. 2001). Other published values are 0.5, 0.6 and 0.675. 567 - fmdfactor: How much should Tm decrease per percent formamide. 568 Default=0.65. Several papers report factors between 0.6 and 0.72. 569 - fmdmethod: 570 571 1. Tm = Tm - factor(%formamide) (Default) 572 2. Tm = Tm + (0.453(f(GC)) - 2.88) x [formamide] 573 574 Here f(GC) is fraction of GC. 575 Note (again) that in fmdmethod=1 formamide concentration is given in %, 576 while in fmdmethod=2 it is given in molar. 577 - GC: GC content in percent. 578 579 Examples: 580 >>> from Bio.SeqUtils import MeltingTemp as mt 581 >>> mt.chem_correction(70) 582 70 583 >>> print('%0.2f' % mt.chem_correction(70, DMSO=3)) 584 67.75 585 >>> print('%0.2f' % mt.chem_correction(70, fmd=5)) 586 66.75 587 >>> print('%0.2f' % mt.chem_correction(70, fmdmethod=2, fmd=1.25, 588 ... GC=50)) 589 66.68 590 591 """ 592 if DMSO: 593 melting_temp -= DMSOfactor * DMSO 594 if fmd: 595 # McConaughy et al. (1969), Biochemistry 8: 3289-3295 596 if fmdmethod == 1: 597 # Note: Here fmd is given in percent 598 melting_temp -= fmdfactor * fmd 599 # Blake & Delcourt (1996), Nucl Acids Res 11: 2095-2103 600 if fmdmethod == 2: 601 if GC is None or GC < 0: 602 raise ValueError('\'GC\' is missing or negative') 603 # Note: Here fmd is given in molar 604 melting_temp += (0.453 * (GC / 100.0) - 2.88) * fmd 605 if fmdmethod not in (1, 2): 606 raise ValueError('\'fmdmethod\' must be 1 or 2') 607 return melting_temp
608 609
610 -def Tm_Wallace(seq, check=True, strict=True):
611 """Calculate and return the Tm using the 'Wallace rule'. 612 613 Tm = 4 degC * (G + C) + 2 degC * (A+T) 614 615 The Wallace rule (Thein & Wallace 1986, in Human genetic diseases: a 616 practical approach, 33-50) is often used as rule of thumb for approximate 617 Tm calculations for primers of 14 to 20 nt length. 618 619 Non-DNA characters (e.g., E, F, J, !, 1, etc) are ignored by this method. 620 621 Examples: 622 >>> from Bio.SeqUtils import MeltingTemp as mt 623 >>> mt.Tm_Wallace('ACGTTGCAATGCCGTA') 624 48.0 625 >>> mt.Tm_Wallace('ACGT TGCA ATGC CGTA') 626 48.0 627 >>> mt.Tm_Wallace('1ACGT2TGCA3ATGC4CGTA') 628 48.0 629 630 """ 631 seq = str(seq) 632 if check: 633 seq = _check(seq, 'Tm_Wallace') 634 635 melting_temp = 2 * (sum(map(seq.count, ('A', 'T', 'W')))) + \ 636 4 * (sum(map(seq.count, ('C', 'G', 'S')))) 637 638 # Intermediate values for ambiguous positions: 639 tmp = 3 * (sum(map(seq.count, ('K', 'M', 'N', 'R', 'Y')))) + \ 640 10 / 3.0 * (sum(map(seq.count, ('B', 'V')))) + \ 641 8 / 3.0 * (sum(map(seq.count, ('D', 'H')))) 642 if strict and tmp: 643 raise ValueError('ambiguous bases B, D, H, K, M, N, R, V, Y not ' + 644 'allowed when strict=True') 645 else: 646 melting_temp += tmp 647 return melting_temp
648 649
650 -def Tm_GC(seq, check=True, strict=True, valueset=7, userset=None, Na=50, K=0, 651 Tris=0, Mg=0, dNTPs=0, saltcorr=0, mismatch=True):
652 """Return the Tm using empirical formulas based on GC content. 653 654 General format: Tm = A + B(%GC) - C/N + salt correction - D(%mismatch) 655 656 A, B, C, D: empirical constants, N: primer length 657 D (amount of decrease in Tm per % mismatch) is often 1, but sometimes other 658 values have been used (0.6-1.5). Use 'X' to indicate the mismatch position 659 in the sequence. Note that this mismatch correction is a rough estimate. 660 661 >>> from Bio.SeqUtils import MeltingTemp as mt 662 >>> print("%0.2f" % mt.Tm_GC('CTGCTGATXGCACGAGGTTATGG', valueset=2)) 663 69.20 664 665 Arguments: 666 - valueset: A few often cited variants are included: 667 668 1. Tm = 69.3 + 0.41(%GC) - 650/N 669 (Marmur & Doty 1962, J Mol Biol 5: 109-118; Chester & Marshak 1993), 670 Anal Biochem 209: 284-290) 671 2. Tm = 81.5 + 0.41(%GC) - 675/N - %mismatch 672 'QuikChange' formula. Recommended (by the manufacturer) for the 673 design of primers for QuikChange mutagenesis. 674 3. Tm = 81.5 + 0.41(%GC) - 675/N + 16.6 x log[Na+] 675 (Marmur & Doty 1962, J Mol Biol 5: 109-118; Schildkraut & Lifson 676 1965, Biopolymers 3: 195-208) 677 4. Tm = 81.5 + 0.41(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x 678 [Na+])) - %mismatch 679 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). This is the 680 standard formula in approximative mode of MELTING 4.3. 681 5. Tm = 78 + 0.7(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x [Na+])) 682 - %mismatch 683 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). For RNA. 684 6. Tm = 67 + 0.8(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x [Na+])) 685 - %mismatch 686 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). For RNA/DNA 687 hybrids. 688 7. Tm = 81.5 + 0.41(%GC) - 600/N + 16.6 x log[Na+] 689 Used by Primer3Plus to calculate the product Tm. Default set. 690 8. Tm = 77.1 + 0.41(%GC) - 528/N + 11.7 x log[Na+] 691 (von Ahsen et al. 2001, Clin Chem 47: 1956-1961). Recommended 'as a 692 tradeoff between accuracy and ease of use'. 693 694 - userset: Tuple of four values for A, B, C, and D. Usersets override 695 valuesets. 696 - Na, K, Tris, Mg, dNTPs: Concentration of the respective ions [mM]. If 697 any of K, Tris, Mg and dNTPS is non-zero, a 'sodium-equivalent' 698 concentration is calculated and used for salt correction (von Ahsen et 699 al., 2001). 700 - saltcorr: Type of salt correction (see method salt_correction). 701 Default=5. 0 or None means no salt correction. 702 - mismatch: If 'True' (default) every 'X' in the sequence is counted as 703 mismatch. 704 705 """ 706 if saltcorr == 5: 707 raise ValueError('salt-correction method 5 not applicable' + 708 'to Tm_GC') 709 seq = str(seq) 710 if check: 711 seq = _check(seq, 'Tm_GC') 712 percent_gc = SeqUtils.GC(seq) 713 # Ambiguous bases: add 0.5, 0.67 or 0.33% depending on G+C probability: 714 tmp = sum(map(seq.count, ('K', 'M', 'N', 'R', 'Y'))) * 50.0 / len(seq) + \ 715 sum(map(seq.count, ('B', 'V'))) * 66.67 / len(seq) + \ 716 sum(map(seq.count, ('D', 'H'))) * 33.33 / len(seq) 717 if strict and tmp: 718 raise ValueError('ambiguous bases B, D, H, K, M, N, R, V, Y not ' + 719 'allowed when strict=True') 720 else: 721 percent_gc += tmp 722 if userset: 723 A, B, C, D = userset 724 else: 725 if valueset == 1: 726 A, B, C, D = (69.3, 0.41, 650, 1) 727 saltcorr = 0 728 if valueset == 2: 729 A, B, C, D = (81.5, 0.41, 675, 1) 730 saltcorr = 0 731 if valueset == 3: 732 A, B, C, D = (81.5, 0.41, 675, 1) 733 saltcorr = 2 734 if valueset == 4: 735 A, B, C, D = (81.5, 0.41, 500, 1) 736 saltcorr = 3 737 if valueset == 5: 738 A, B, C, D = (78.0, 0.7, 500, 1) 739 saltcorr = 3 740 if valueset == 6: 741 A, B, C, D = (67.0, 0.8, 500, 1) 742 saltcorr = 3 743 if valueset == 7: 744 A, B, C, D = (81.5, 0.41, 600, 1) 745 saltcorr = 2 746 if valueset == 8: 747 A, B, C, D = (77.1, 0.41, 528, 1) 748 saltcorr = 4 749 if valueset > 8: 750 raise ValueError('allowed values for parameter \'valueset\' are 0-8.') 751 752 melting_temp = A + B * percent_gc - C / (len(seq) * 1.0) 753 if saltcorr: 754 melting_temp += salt_correction(Na=Na, K=K, Tris=Tris, Mg=Mg, 755 dNTPs=dNTPs, seq=seq, method=saltcorr) 756 if mismatch: 757 melting_temp -= D * (seq.count('X') * 100.0 / len(seq)) 758 return melting_temp
759 760
761 -def _key_error(neighbors, strict):
762 """Throw an error or a warning if there is no data for the neighbors.""" 763 # We haven't found the key in the tables 764 if strict: 765 raise ValueError('no thermodynamic data for neighbors \'' + neighbors + 766 '\' available') 767 else: 768 warnings.warn('no themodynamic data for neighbors \'' + neighbors + 769 '\' available. Calculation will be wrong', 770 BiopythonWarning) 771 return
772 773
774 -def Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=DNA_NN3, 775 tmm_table=DNA_TMM1, imm_table=DNA_IMM1, de_table=DNA_DE1, 776 dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, 777 dNTPs=0, saltcorr=5):
778 """Return the Tm using nearest neighbor thermodynamics. 779 780 Arguments: 781 - seq: The primer/probe sequence as string or Biopython sequence object. 782 For RNA/DNA hybridizations seq must be the RNA sequence. 783 - c_seq: Complementary sequence. The sequence of the template/target in 784 3'->5' direction. c_seq is necessary for mismatch correction and 785 dangling-ends correction. Both corrections will automatically be 786 applied if mismatches or dangling ends are present. Default=None. 787 - shift: Shift of the primer/probe sequence on the template/target 788 sequence, e.g.:: 789 790 shift=0 shift=1 shift= -1 791 Primer (seq): 5' ATGC... 5' ATGC... 5' ATGC... 792 Template (c_seq): 3' TACG... 3' CTACG... 3' ACG... 793 794 The shift parameter is necessary to align seq and c_seq if they have 795 different lengths or if they should have dangling ends. Default=0 796 - table: Thermodynamic NN values, eight tables are implemented: 797 For DNA/DNA hybridizations: 798 799 - DNA_NN1: values from Breslauer et al. (1986) 800 - DNA_NN2: values from Sugimoto et al. (1996) 801 - DNA_NN3: values from Allawi & SantaLucia (1997) (default) 802 - DNA_NN4: values from SantaLucia & Hicks (2004) 803 804 For RNA/RNA hybridizations: 805 806 - RNA_NN1: values from Freier et al. (1986) 807 - RNA_NN2: values from Xia et al. (1998) 808 - RNA_NN3: valuse from Chen et al. (2012) 809 810 For RNA/DNA hybridizations: 811 812 - R_DNA_NN1: values from Sugimoto et al. (1995) 813 814 Use the module's maketable method to make a new table or to update one 815 one of the implemented tables. 816 - tmm_table: Thermodynamic values for terminal mismatches. 817 Default: DNA_TMM1 (SantaLucia & Peyret, 2001) 818 - imm_table: Thermodynamic values for internal mismatches, may include 819 insosine mismatches. Default: DNA_IMM1 (Allawi & SantaLucia, 1997-1998; 820 Peyret et al., 1999; Watkins & SantaLucia, 2005) 821 - de_table: Thermodynamic values for dangling ends: 822 823 - DNA_DE1: for DNA. Values from Bommarito et al. (2000). Default 824 - RNA_DE1: for RNA. Values from Turner & Mathews (2010) 825 826 - dnac1: Concentration of the higher concentrated strand [nM]. Typically 827 this will be the primer (for PCR) or the probe. Default=25. 828 - dnac2: Concentration of the lower concentrated strand [nM]. In PCR this 829 is the template strand which concentration is typically very low and may 830 be ignored (dnac2=0). In oligo/oligo hybridization experiments, dnac1 831 equals dnac1. Default=25. 832 MELTING and Primer3Plus use k = [Oligo(Total)]/4 by default. To mimic 833 this behaviour, you have to divide [Oligo(Total)] by 2 and assign this 834 concentration to dnac1 and dnac2. E.g., Total oligo concentration of 835 50 nM in Primer3Plus means dnac1=25, dnac2=25. 836 - selfcomp: Is the sequence self-complementary? Default=False. If 'True' 837 the primer is thought binding to itself, thus dnac2 is not considered. 838 - Na, K, Tris, Mg, dNTPs: See method 'Tm_GC' for details. Defaults: Na=50, 839 K=0, Tris=0, Mg=0, dNTPs=0. 840 - saltcorr: See method 'Tm_GC'. Default=5. 0 means no salt correction. 841 842 """ 843 seq = str(seq) 844 if not c_seq: 845 # c_seq must be provided by user if dangling ends or mismatches should 846 # be taken into account. Otherwise take perfect complement. 847 c_seq = Seq.Seq(seq).complement() 848 c_seq = str(c_seq) 849 if check: 850 seq = _check(seq, 'Tm_NN') 851 c_seq = _check(c_seq, 'Tm_NN') 852 tmp_seq = seq 853 tmp_cseq = c_seq 854 delta_h = 0 855 delta_s = 0 856 d_h = 0 # Names for indexes 857 d_s = 1 # 0 and 1 858 859 # Dangling ends? 860 if shift or len(seq) != len(c_seq): 861 # Align both sequences using the shift parameter 862 if shift > 0: 863 tmp_seq = '.' * shift + seq 864 if shift < 0: 865 tmp_cseq = '.' * abs(shift) + c_seq 866 if len(tmp_cseq) > len(tmp_seq): 867 tmp_seq += (len(tmp_cseq) - len(tmp_seq)) * '.' 868 if len(tmp_cseq) < len(tmp_seq): 869 tmp_cseq += (len(tmp_seq) - len(tmp_cseq)) * '.' 870 # Remove 'over-dangling' ends 871 while tmp_seq.startswith('..') or tmp_cseq.startswith('..'): 872 tmp_seq = tmp_seq[1:] 873 tmp_cseq = tmp_cseq[1:] 874 while tmp_seq.endswith('..') or tmp_cseq.endswith('..'): 875 tmp_seq = tmp_seq[:-1] 876 tmp_cseq = tmp_cseq[:-1] 877 # Now for the dangling ends 878 if tmp_seq.startswith('.') or tmp_cseq.startswith('.'): 879 left_de = tmp_seq[:2] + '/' + tmp_cseq[:2] 880 try: 881 delta_h += de_table[left_de][d_h] 882 delta_s += de_table[left_de][d_s] 883 except KeyError: 884 _key_error(left_de, strict) 885 tmp_seq = tmp_seq[1:] 886 tmp_cseq = tmp_cseq[1:] 887 if tmp_seq.endswith('.') or tmp_cseq.endswith('.'): 888 right_de = tmp_cseq[-2:][::-1] + '/' + tmp_seq[-2:][::-1] 889 try: 890 delta_h += de_table[right_de][d_h] 891 delta_s += de_table[right_de][d_s] 892 except KeyError: 893 _key_error(right_de, strict) 894 tmp_seq = tmp_seq[:-1] 895 tmp_cseq = tmp_cseq[:-1] 896 897 # Now for terminal mismatches 898 left_tmm = tmp_cseq[:2][::-1] + '/' + tmp_seq[:2][::-1] 899 if left_tmm in tmm_table: 900 delta_h += tmm_table[left_tmm][d_h] 901 delta_s += tmm_table[left_tmm][d_s] 902 tmp_seq = tmp_seq[1:] 903 tmp_cseq = tmp_cseq[1:] 904 right_tmm = tmp_seq[-2:] + '/' + tmp_cseq[-2:] 905 if right_tmm in tmm_table: 906 delta_h += tmm_table[right_tmm][d_h] 907 delta_s += tmm_table[right_tmm][d_s] 908 tmp_seq = tmp_seq[:-1] 909 tmp_cseq = tmp_cseq[:-1] 910 911 # Now everything 'unusual' at the ends is handled and removed and we can 912 # look at the initiation. 913 # One or several of the following initiation types may apply: 914 915 # Type: General initiation value 916 delta_h += nn_table['init'][d_h] 917 delta_s += nn_table['init'][d_s] 918 919 # Type: Duplex with no (allA/T) or at least one (oneG/C) GC pair 920 if SeqUtils.GC(seq) == 0: 921 delta_h += nn_table['init_allA/T'][d_h] 922 delta_s += nn_table['init_allA/T'][d_s] 923 else: 924 delta_h += nn_table['init_oneG/C'][d_h] 925 delta_s += nn_table['init_oneG/C'][d_s] 926 927 # Type: Penalty if 5' end is T 928 if seq.startswith('T'): 929 delta_h += nn_table['init_5T/A'][d_h] 930 delta_s += nn_table['init_5T/A'][d_s] 931 if seq.endswith('A'): 932 delta_h += nn_table['init_5T/A'][d_h] 933 delta_s += nn_table['init_5T/A'][d_s] 934 935 # Type: Different values for G/C or A/T terminal basepairs 936 ends = seq[0] + seq[-1] 937 AT = ends.count('A') + ends.count('T') 938 GC = ends.count('G') + ends.count('C') 939 delta_h += nn_table['init_A/T'][d_h] * AT 940 delta_s += nn_table['init_A/T'][d_s] * AT 941 delta_h += nn_table['init_G/C'][d_h] * GC 942 delta_s += nn_table['init_G/C'][d_s] * GC 943 944 # Finally, the 'zipping' 945 for basenumber in range(len(tmp_seq) - 1): 946 neighbors = tmp_seq[basenumber:basenumber + 2] + '/' + \ 947 tmp_cseq[basenumber:basenumber + 2] 948 if neighbors in imm_table: 949 delta_h += imm_table[neighbors][d_h] 950 delta_s += imm_table[neighbors][d_s] 951 elif neighbors[::-1] in imm_table: 952 delta_h += imm_table[neighbors[::-1]][d_h] 953 delta_s += imm_table[neighbors[::-1]][d_s] 954 elif neighbors in nn_table: 955 delta_h += nn_table[neighbors][d_h] 956 delta_s += nn_table[neighbors][d_s] 957 elif neighbors[::-1] in nn_table: 958 delta_h += nn_table[neighbors[::-1]][d_h] 959 delta_s += nn_table[neighbors[::-1]][d_s] 960 else: 961 # We haven't found the key... 962 _key_error(neighbors, strict) 963 964 k = (dnac1 - (dnac2 / 2.0)) * 1e-9 965 if selfcomp: 966 k = dnac1 * 1e-9 967 delta_h += nn_table['sym'][d_h] 968 delta_s += nn_table['sym'][d_s] 969 R = 1.987 # universal gas constant in Cal/degrees C*Mol 970 if saltcorr: 971 corr = salt_correction(Na=Na, K=K, Tris=Tris, Mg=Mg, dNTPs=dNTPs, 972 method=saltcorr, seq=seq) 973 if saltcorr == 5: 974 delta_s += corr 975 melting_temp = (1000 * delta_h) / (delta_s + (R * (math.log(k)))) - 273.15 976 if saltcorr in (1, 2, 3, 4): 977 melting_temp += corr 978 if saltcorr in (6, 7): 979 # Tm = 1/(1/Tm + corr) 980 melting_temp = (1 / (1 / (melting_temp + 273.15) + corr) - 273.15) 981 982 return melting_temp
983 984
985 -def Tm_staluc(s, dnac=50, saltc=50, rna=0):
986 """Returns DNA/DNA Tm using nearest neighbor thermodynamics (OBSOLETE). 987 988 This method may be depreceated in the future. Use Tm_NN instead. Tm_NN 989 with default values gives the same result as Tm_staluc. 990 991 s is the sequence as string or Seq object 992 dnac is DNA concentration [nM] 993 saltc is salt concentration [mM]. 994 rna=0 is for DNA/DNA (default), use 1 for RNA/RNA hybridisation. 995 996 For DNA/DNA, see Allawi & SantaLucia (1997), Biochemistry 36: 10581-10594 997 For RNA/RNA, see Xia et al (1998), Biochemistry 37: 14719-14735 998 999 Example: 1000 1001 >>> print("%0.2f" % Tm_staluc('CAGTCAGTACGTACGTGTACTGCCGTA')) 1002 59.87 1003 >>> print("%0.2f" % Tm_staluc('CAGTCAGTACGTACGTGTACTGCCGTA', rna=True)) 1004 77.90 1005 1006 You can also use a Seq object instead of a string, 1007 1008 >>> from Bio.Seq import Seq 1009 >>> from Bio.Alphabet import generic_nucleotide 1010 >>> s = Seq('CAGTCAGTACGTACGTGTACTGCCGTA', generic_nucleotide) 1011 >>> print("%0.2f" % Tm_staluc(s)) 1012 59.87 1013 >>> print("%0.2f" % Tm_staluc(s, rna=True)) 1014 77.90 1015 1016 """ 1017 # Original method was by Sebastian Bassi <sbassi@genesdigitales.com>. It is 1018 # now superseded by Tm_NN. 1019 1020 warnings.warn('Tm_staluc may be depreciated in the future. Use Tm_NN ' + 1021 'instead.', PendingDeprecationWarning) 1022 if not rna: 1023 return Tm_NN(s, dnac1=dnac / 2.0, dnac2=dnac / 2.0, Na=saltc) 1024 elif rna == 1: 1025 return Tm_NN(s, dnac1=dnac / 2.0, dnac2=dnac / 2.0, Na=saltc, 1026 nn_table=RNA_NN2) 1027 else: 1028 raise ValueError("rna={0} not supported".format(rna))
1029 1030 1031 if __name__ == "__main__": 1032 from Bio._utils import run_doctest 1033 run_doctest() 1034