Package Bio :: Package SeqUtils :: Module MeltingTemp
[hide private]
[frames] | no frames]

Source Code for Module Bio.SeqUtils.MeltingTemp

   1  # Copyright 2004-2008 by Sebastian Bassi. 
   2  # Copyright 2013-2018 by Markus Piotrowski. 
   3  # All rights reserved. 
   4  # This file is part of the Biopython distribution and governed by your 
   5  # choice of the "Biopython License Agreement" or the "BSD 3-Clause License". 
   6  # Please see the LICENSE file that should have been included as part of this 
   7  # package. 
   8  """Calculate the melting temperature of nucleotide sequences. 
   9   
  10  This module contains three different methods to calculate the melting 
  11  temperature of oligonucleotides: 
  12   
  13  1. Tm_Wallace: 'Rule of thumb' 
  14  2. Tm_GC: Empirical formulas based on GC content. Salt and mismatch corrections 
  15     can be included. 
  16  3. Tm_NN: Calculation based on nearest neighbor thermodynamics. Several tables 
  17     for DNA/DNA, DNA/RNA and RNA/RNA hybridizations are included. 
  18     Correction for mismatches, dangling ends, salt concentration and other 
  19     additives are available. 
  20   
  21  Tm_staluc is the 'old' NN calculation and is kept for compatibility. It is, 
  22  however, recommended to use Tm_NN instead, since Tm_staluc may be depreceated 
  23  in the future. Also, Tm_NN has much more options. Using Tm_staluc and Tm_NN 
  24  with default parameters gives (essentially) the same results. 
  25   
  26  General parameters for most Tm methods: 
  27   - seq -- A Biopython sequence object or a string. 
  28   - check -- Checks if the sequence is valid for the given method (default= 
  29     True). In general, whitespaces and non-base characters are removed and 
  30     characters are converted to uppercase. RNA will be backtranscribed. 
  31   - strict -- Do not allow base characters or neighbor duplex keys (e.g. 
  32     'AT/NA') that could not or not unambigiously be evaluated for the respective 
  33     method (default=True). Note that W (= A or T) and S (= C or G) are not 
  34     ambiguous for Tm_Wallace and Tm_GC. If 'False', average values (if 
  35     applicable) will be used. 
  36   
  37  This module is not able to detect self-complementary and it will not use 
  38  alignment tools to align an oligonucleotide sequence to its target sequence. 
  39  Thus it can not detect dangling-ends and mismatches by itself (don't even think 
  40  about bulbs and loops). These parameters have to be handed over to the 
  41  respective method. 
  42   
  43  Other public methods of this module: 
  44   - make_table     : To create a table with thermodynamic data. 
  45   - salt_correction: To adjust Tm to a given salt concentration by different 
  46     formulas. This method is called from Tm_GC and Tm_NN but may 
  47     also be accessed 'manually'. It returns a correction term, not 
  48     a corrected Tm! 
  49   - chem_correction: To adjust Tm regarding the chemical additives DMSO and 
  50     formaldehyde. The method returns a corrected Tm. Chemical 
  51     correction is not an integral part of the Tm methods and must 
  52     be called additionally. 
  53   
  54  For example: 
  55   
  56      >>> from Bio.SeqUtils import MeltingTemp as mt 
  57      >>> from Bio.Seq import Seq 
  58      >>> mystring = 'CGTTCCAAAGATGTGGGCATGAGCTTAC' 
  59      >>> myseq = Seq(mystring) 
  60      >>> print('%0.2f' % mt.Tm_Wallace(mystring)) 
  61      84.00 
  62      >>> print('%0.2f' % mt.Tm_Wallace(myseq)) 
  63      84.00 
  64      >>> print('%0.2f' % mt.Tm_GC(myseq)) 
  65      58.73 
  66      >>> print('%0.2f' % mt.Tm_NN(myseq)) 
  67      60.32 
  68   
  69  Tm_NN with default values gives same result as 'old' Tm_staluc. However, values 
  70  differ for RNA, since Tm_staluc had some errors for RNA calculation. These 
  71  errors have been fixed in this version. 
  72   
  73  New Tm_NN can do slightly more: 
  74  Using different thermodynamic tables, e.g. from Breslauer '86 or Sugimoto '96: 
  75   
  76      >>> print('%0.2f' % mt.Tm_NN(myseq, nn_table=mt.DNA_NN1))  # Breslauer '86 
  77      72.19 
  78      >>> print('%0.2f' % mt.Tm_NN(myseq, nn_table=mt.DNA_NN2))  # Sugimoto '96 
  79      65.47 
  80   
  81  Tables for RNA and RNA/DNA hybrids are included: 
  82   
  83      >>> print('%0.2f' % mt.Tm_NN(myseq, nn_table=mt.RNA_NN1))  # Freier '86 
  84      73.35 
  85      >>> print('%0.2f' % mt.Tm_NN(myseq, nn_table=mt.R_DNA_NN1))  # Sugimoto '95 
  86      57.17 
  87   
  88  Several types of salc correction (for Tm_NN and Tm_GC): 
  89   
  90      >>> for i in range(1, 8): 
  91      ...     print("Type: %d, Tm: %0.2f" % (i, Tm_NN(myseq, saltcorr=i))) 
  92      ... 
  93      Type: 1, Tm: 54.27 
  94      Type: 2, Tm: 54.02 
  95      Type: 3, Tm: 59.60 
  96      Type: 4, Tm: 60.64 
  97      Type: 5, Tm: 60.32 
  98      Type: 6, Tm: 59.78 
  99      Type: 7, Tm: 59.78 
 100   
 101  Correction for other monovalent cations (K+, Tris), Mg2+ and dNTPs according 
 102  to von Ahsen et al. (2001) or Owczarzy et al. (2008) (for Tm_NN and Tm_GC): 
 103   
 104      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10)) 
 105      60.79 
 106      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5)) 
 107      67.39 
 108      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5, saltcorr=7)) 
 109      66.81 
 110      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5, dNTPs=0.6, 
 111      ...                          saltcorr=7)) 
 112      66.04 
 113   
 114  Dangling ends and mismatches, e.g.:: 
 115   
 116      Oligo:     CGTTCCaAAGATGTGGGCATGAGCTTAC       CGTTCCaAAGATGTGGGCATGAGCTTAC 
 117                 ::::::X:::::::::::::::::::::  or   ::::::X::::::::::::::::::::: 
 118      Template:  GCAAGGcTTCTACACCCGTACTCGAATG      TGCAAGGcTTCTACACCCGTACTCGAATGC 
 119   
 120  Here: 
 121   
 122      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC')) 
 123      60.32 
 124      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC', 
 125      ...                    c_seq='GCAAGGcTTCTACACCCGTACTCGAATG')) 
 126      55.39 
 127      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC', shift=1, 
 128      ...                   c_seq='TGCAAGGcTTCTACACCCGTACTCGAATGC')) 
 129      55.69 
 130   
 131  The same for RNA: 
 132   
 133      >>> print('%0.2f' % mt.Tm_NN('CGUUCCAAAGAUGUGGGCAUGAGCUUAC', 
 134      ...                   c_seq='UGCAAGGcUUCUACACCCGUACUCGAAUGC', 
 135      ...                   shift=1, nn_table=mt.RNA_NN3, 
 136      ...                   de_table=mt.RNA_DE1)) 
 137      73.00 
 138   
 139  Note, that thermodynamic data are not available for all kind of mismatches, 
 140  e.g. most double mismatches or terminal mismaches combined with danglind ends: 
 141   
 142      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC', 
 143      ...                   c_seq='TtCAAGGcTTCTACACCCGTACTCGAATGC', 
 144      ...                   shift=1)) 
 145      Traceback (most recent call last): 
 146      ValueError: no thermodynamic data for neighbors '.C/TT' available 
 147   
 148  Make your own tables, or update/extend existing tables. E.g., add values for 
 149  locked nucleotides. Here, 'locked A' (and its complement) should be represented 
 150  by '1': 
 151   
 152      >>> mytable = mt.make_table(oldtable=mt.DNA_NN3, 
 153      ...                         values={'A1/T1':(-6.608, -17.235), 
 154      ...                         '1A/1T':(-6.893, -15.923)}) 
 155      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC')) 
 156      60.32 
 157      >>> print('%0.2f' % mt.Tm_NN('CGTTCCA1AGATGTGGGCATGAGCTTAC', 
 158      ...                           nn_table=mytable, check=False)) 
 159      ... # 'check' must be False, otherwise '1' would be discarded 
 160      62.53 
 161   
 162  """ 
 163   
 164  from __future__ import print_function 
 165   
 166  import math 
 167  import warnings 
 168   
 169  from Bio import SeqUtils, Seq 
 170  from Bio import BiopythonWarning 
 171   
 172   
 173  # Thermodynamic lookup tables (dictionaries): 
 174  # Enthalpy (dH) and entropy (dS) values for nearest neighbors and initiation 
 175  # process. Calculation of duplex initiation is quite different in several 
 176  # papers; to allow for a general calculation, all different initiation 
 177  # parameters are included in all tables and non-applicable parameters are set 
 178  # to zero. 
 179  # The key is either an initiation type (e.g., 'init_A/T') or a nearest neighbor 
 180  # duplex sequence (e.g., GT/CA, to read 5'GT3'-3'CA5'). The values are tuples 
 181  # of dH (kcal/mol), dS (cal/mol K). 
 182   
 183  # DNA/DNA 
 184  # Breslauer et al. (1986), Proc Natl Acad Sci USA 83: 3746-3750 
 185  DNA_NN1 = { 
 186      "init": (0, 0), "init_A/T": (0, 0), "init_G/C": (0, 0), 
 187      "init_oneG/C": (0, -16.8), "init_allA/T": (0, -20.1), "init_5T/A": (0, 0), 
 188      "sym": (0, -1.3), 
 189      "AA/TT": (-9.1, -24.0), "AT/TA": (-8.6, -23.9), "TA/AT": (-6.0, -16.9), 
 190      "CA/GT": (-5.8, -12.9), "GT/CA": (-6.5, -17.3), "CT/GA": (-7.8, -20.8), 
 191      "GA/CT": (-5.6, -13.5), "CG/GC": (-11.9, -27.8), "GC/CG": (-11.1, -26.7), 
 192      "GG/CC": (-11.0, -26.6)} 
 193   
 194  # Sugimoto et al. (1996), Nuc Acids Res 24 : 4501-4505 
 195  DNA_NN2 = { 
 196      "init": (0.6, -9.0), "init_A/T": (0, 0), "init_G/C": (0, 0), 
 197      "init_oneG/C": (0, 0), "init_allA/T": (0, 0), "init_5T/A": (0, 0), 
 198      "sym": (0, -1.4), 
 199      "AA/TT": (-8.0, -21.9), "AT/TA": (-5.6, -15.2), "TA/AT": (-6.6, -18.4), 
 200      "CA/GT": (-8.2, -21.0), "GT/CA": (-9.4, -25.5), "CT/GA": (-6.6, -16.4), 
 201      "GA/CT": (-8.8, -23.5), "CG/GC": (-11.8, -29.0), "GC/CG": (-10.5, -26.4), 
 202      "GG/CC": (-10.9, -28.4)} 
 203   
 204  # Allawi and SantaLucia (1997), Biochemistry 36: 10581-10594 
 205  DNA_NN3 = { 
 206      "init": (0, 0), "init_A/T": (2.3, 4.1), "init_G/C": (0.1, -2.8), 
 207      "init_oneG/C": (0, 0), "init_allA/T": (0, 0), "init_5T/A": (0, 0), 
 208      "sym": (0, -1.4), 
 209      "AA/TT": (-7.9, -22.2), "AT/TA": (-7.2, -20.4), "TA/AT": (-7.2, -21.3), 
 210      "CA/GT": (-8.5, -22.7), "GT/CA": (-8.4, -22.4), "CT/GA": (-7.8, -21.0), 
 211      "GA/CT": (-8.2, -22.2), "CG/GC": (-10.6, -27.2), "GC/CG": (-9.8, -24.4), 
 212      "GG/CC": (-8.0, -19.9)} 
 213   
 214  # SantaLucia & Hicks (2004), Annu. Rev. Biophys. Biomol. Struct 33: 415-440 
 215  DNA_NN4 = { 
 216      "init": (0.2, -5.7), "init_A/T": (2.2, 6.9), "init_G/C": (0, 0), 
 217      "init_oneG/C": (0, 0), "init_allA/T": (0, 0), "init_5T/A": (0, 0), 
 218      "sym": (0, -1.4), 
 219      "AA/TT": (-7.6, -21.3), "AT/TA": (-7.2, -20.4), "TA/AT": (-7.2, -20.4), 
 220      "CA/GT": (-8.5, -22.7), "GT/CA": (-8.4, -22.4), "CT/GA": (-7.8, -21.0), 
 221      "GA/CT": (-8.2, -22.2), "CG/GC": (-10.6, -27.2), "GC/CG": (-9.8, -24.4), 
 222      "GG/CC": (-8.0, -19.0)} 
 223   
 224  # RNA/RNA 
 225  # Freier et al. (1986), Proc Natl Acad Sci USA 83: 9373-9377 
 226  RNA_NN1 = { 
 227      "init": (0, -10.8), "init_A/T": (0, 0), "init_G/C": (0, 0), 
 228      "init_oneG/C": (0, 0), "init_allA/T": (0, 0), "init_5T/A": (0, 0), 
 229      "sym": (0, -1.4), 
 230      "AA/TT": (-6.6, -18.4), "AT/TA": (-5.7, -15.5), "TA/AT": (-8.1, -22.6), 
 231      "CA/GT": (-10.5, -27.8), "GT/CA": (-10.2, -26.2), "CT/GA": (-7.6, -19.2), 
 232      "GA/CT": (-13.3, -35.5), "CG/GC": (-8.0, -19.4), "GC/CG": (-14.2, -34.9), 
 233      "GG/CC": (-12.2, -29.7)} 
 234   
 235  # Xia et al (1998), Biochemistry 37: 14719-14735 
 236  RNA_NN2 = { 
 237      "init": (3.61, -1.5), "init_A/T": (3.72, 10.5), "init_G/C": (0, 0), 
 238      "init_oneG/C": (0, 0), "init_allA/T": (0, 0), "init_5T/A": (0, 0), 
 239      "sym": (0, -1.4), 
 240      "AA/TT": (-6.82, -19.0), "AT/TA": (-9.38, -26.7), "TA/AT": (-7.69, -20.5), 
 241      "CA/GT": (-10.44, -26.9), "GT/CA": (-11.40, -29.5), 
 242      "CT/GA": (-10.48, -27.1), "GA/CT": (-12.44, -32.5), 
 243      "CG/GC": (-10.64, -26.7), "GC/CG": (-14.88, -36.9), 
 244      "GG/CC": (-13.39, -32.7)} 
 245   
 246  # Chen et al. (2012), Biochemistry 51: 3508-3522 
 247  RNA_NN3 = { 
 248      "init": (6.40, 6.99), "init_A/T": (3.85, 11.04), "init_G/C": (0, 0), 
 249      "init_oneG/C": (0, 0), "init_allA/T": (0, 0), "init_5T/A": (0, 0), 
 250      "sym": (0, -1.4), 
 251      "AA/TT": (-7.09, -19.8), "AT/TA": (-9.11, -25.8), "TA/AT": (-8.50, -22.9), 
 252      "CA/GT": (-11.03, -28.8), "GT/CA": (-11.98, -31.3), 
 253      "CT/GA": (-10.90, -28.5), "GA/CT": (-13.21, -34.9), 
 254      "CG/GC": (-10.88, -27.4), "GC/CG": (-16.04, -40.6), 
 255      "GG/CC": (-14.18, -35.0), "GT/TG": (-13.83, -46.9), 
 256      "GG/TT": (-17.82, -56.7), "AG/TT": (-3.96, -11.6), 
 257      "TG/AT": (-0.96, -1.8), "TT/AG": (-10.38, -31.8), "TG/GT": (-12.64, -38.9), 
 258      "AT/TG": (-7.39, -21.0), "CG/GT": (-5.56, -13.9), "CT/GG": (-9.44, -24.7), 
 259      "GG/CT": (-7.03, -16.8), "GT/CG": (-11.09, -28.8)} 
 260   
 261  # RNA/DNA 
 262  # Sugimoto et al. (1995), Biochemistry 34: 11211-11216 
 263  R_DNA_NN1 = { 
 264      "init": (1.9, -3.9), "init_A/T": (0, 0), "init_G/C": (0, 0), 
 265      "init_oneG/C": (0, 0), "init_allA/T": (0, 0), "init_5T/A": (0, 0), 
 266      "sym": (0, 0), 
 267      "AA/TT": (-11.5, -36.4), "AC/TG": (-7.8, -21.6), "AG/TC": (-7.0, -19.7), 
 268      "AT/TA": (-8.3, -23.9), "CA/GT": (-10.4, -28.4), "CC/GG": (-12.8, -31.9), 
 269      "CG/GC": (-16.3, -47.1), "CT/GA": (-9.1, -23.5), "GA/CT": (-8.6, -22.9), 
 270      "GC/CG": (-8.0, -17.1), "GG/CC": (-9.3, -23.2), "GT/CA": (-5.9, -12.3), 
 271      "TA/AT": (-7.8, -23.2), "TC/AG": (-5.5, -13.5), "TG/AC": (-9.0, -26.1), 
 272      "TT/AA": (-7.8, -21.9)} 
 273   
 274  # Internal mismatch and inosine table (DNA) 
 275  # Allawi & SantaLucia (1997), Biochemistry 36: 10581-10594 
 276  # Allawi & SantaLucia (1998), Biochemistry 37: 9435-9444 
 277  # Allawi & SantaLucia (1998), Biochemistry 37: 2170-2179 
 278  # Allawi & SantaLucia (1998), Nucl Acids Res 26: 2694-2701 
 279  # Peyret et al. (1999), Biochemistry 38: 3468-3477 
 280  # Watkins & SantaLucia (2005), Nucl Acids Res 33: 6258-6267 
 281  DNA_IMM1 = { 
 282      "AG/TT": (1.0, 0.9), "AT/TG": (-2.5, -8.3), "CG/GT": (-4.1, -11.7), 
 283      "CT/GG": (-2.8, -8.0), "GG/CT": (3.3, 10.4), "GG/TT": (5.8, 16.3), 
 284      "GT/CG": (-4.4, -12.3), "GT/TG": (4.1, 9.5), "TG/AT": (-0.1, -1.7), 
 285      "TG/GT": (-1.4, -6.2), "TT/AG": (-1.3, -5.3), "AA/TG": (-0.6, -2.3), 
 286      "AG/TA": (-0.7, -2.3), "CA/GG": (-0.7, -2.3), "CG/GA": (-4.0, -13.2), 
 287      "GA/CG": (-0.6, -1.0), "GG/CA": (0.5, 3.2), "TA/AG": (0.7, 0.7), 
 288      "TG/AA": (3.0, 7.4), 
 289      "AC/TT": (0.7, 0.2), "AT/TC": (-1.2, -6.2), "CC/GT": (-0.8, -4.5), 
 290      "CT/GC": (-1.5, -6.1), "GC/CT": (2.3, 5.4), "GT/CC": (5.2, 13.5), 
 291      "TC/AT": (1.2, 0.7), "TT/AC": (1.0, 0.7), 
 292      "AA/TC": (2.3, 4.6), "AC/TA": (5.3, 14.6), "CA/GC": (1.9, 3.7), 
 293      "CC/GA": (0.6, -0.6), "GA/CC": (5.2, 14.2), "GC/CA": (-0.7, -3.8), 
 294      "TA/AC": (3.4, 8.0), "TC/AA": (7.6, 20.2), 
 295      "AA/TA": (1.2, 1.7), "CA/GA": (-0.9, -4.2), "GA/CA": (-2.9, -9.8), 
 296      "TA/AA": (4.7, 12.9), "AC/TC": (0.0, -4.4), "CC/GC": (-1.5, -7.2), 
 297      "GC/CC": (3.6, 8.9), "TC/AC": (6.1, 16.4), "AG/TG": (-3.1, -9.5), 
 298      "CG/GG": (-4.9, -15.3), "GG/CG": (-6.0, -15.8), "TG/AG": (1.6, 3.6), 
 299      "AT/TT": (-2.7, -10.8), "CT/GT": (-5.0, -15.8), "GT/CT": (-2.2, -8.4), 
 300      "TT/AT": (0.2, -1.5), 
 301      "AI/TC": (-8.9, -25.5), "TI/AC": (-5.9, -17.4), "AC/TI": (-8.8, -25.4), 
 302      "TC/AI": (-4.9, -13.9), "CI/GC": (-5.4, -13.7), "GI/CC": (-6.8, -19.1), 
 303      "CC/GI": (-8.3, -23.8), "GC/CI": (-5.0, -12.6), 
 304      "AI/TA": (-8.3, -25.0), "TI/AA": (-3.4, -11.2), "AA/TI": (-0.7, -2.6), 
 305      "TA/AI": (-1.3, -4.6), "CI/GA": (2.6, 8.9), "GI/CA": (-7.8, -21.1), 
 306      "CA/GI": (-7.0, -20.0), "GA/CI": (-7.6, -20.2), 
 307      "AI/TT": (0.49, -0.7), "TI/AT": (-6.5, -22.0), "AT/TI": (-5.6, -18.7), 
 308      "TT/AI": (-0.8, -4.3), "CI/GT": (-1.0, -2.4), "GI/CT": (-3.5, -10.6), 
 309      "CT/GI": (0.1, -1.0), "GT/CI": (-4.3, -12.1), 
 310      "AI/TG": (-4.9, -15.8), "TI/AG": (-1.9, -8.5), "AG/TI": (0.1, -1.8), 
 311      "TG/AI": (1.0, 1.0), "CI/GG": (7.1, 21.3), "GI/CG": (-1.1, -3.2), 
 312      "CG/GI": (5.8, 16.9), "GG/CI": (-7.6, -22.0), 
 313      "AI/TI": (-3.3, -11.9), "TI/AI": (0.1, -2.3), "CI/GI": (1.3, 3.0), 
 314      "GI/CI": (-0.5, -1.3)} 
 315   
 316  # Terminal mismatch table (DNA) 
 317  # SantaLucia & Peyret (2001) Patent Application WO 01/94611 
 318  DNA_TMM1 = { 
 319      "AA/TA": (-3.1, -7.8), "TA/AA": (-2.5, -6.3), "CA/GA": (-4.3, -10.7), 
 320      "GA/CA": (-8.0, -22.5), 
 321      "AC/TC": (-0.1, 0.5), "TC/AC": (-0.7, -1.3), "CC/GC": (-2.1, -5.1), 
 322      "GC/CC": (-3.9, -10.6), 
 323      "AG/TG": (-1.1, -2.1), "TG/AG": (-1.1, -2.7), "CG/GG": (-3.8, -9.5), 
 324      "GG/CG": (-0.7, -19.2), 
 325      "AT/TT": (-2.4, -6.5), "TT/AT": (-3.2, -8.9), "CT/GT": (-6.1, -16.9), 
 326      "GT/CT": (-7.4, -21.2), 
 327      "AA/TC": (-1.6, -4.0), "AC/TA": (-1.8, -3.8), "CA/GC": (-2.6, -5.9), 
 328      "CC/GA": (-2.7, -6.0), "GA/CC": (-5.0, -13.8), "GC/CA": (-3.2, -7.1), 
 329      "TA/AC": (-2.3, -5.9), "TC/AA": (-2.7, -7.0), 
 330      "AC/TT": (-0.9, -1.7), "AT/TC": (-2.3, -6.3), "CC/GT": (-3.2, -8.0), 
 331      "CT/GC": (-3.9, -10.6), "GC/CT": (-4.9, -13.5), "GT/CC": (-3.0, -7.8), 
 332      "TC/AT": (-2.5, -6.3), "TT/AC": (-0.7, -1.2), 
 333      "AA/TG": (-1.9, -4.4), "AG/TA": (-2.5, -5.9), "CA/GG": (-3.9, -9.6), 
 334      "CG/GA": (-6.0, -15.5), "GA/CG": (-4.3, -11.1), "GG/CA": (-4.6, -11.4), 
 335      "TA/AG": (-2.0, -4.7), "TG/AA": (-2.4, -5.8), 
 336      "AG/TT": (-3.2, -8.7), "AT/TG": (-3.5, -9.4), "CG/GT": (-3.8, -9.0), 
 337      "CT/GG": (-6.6, -18.7), "GG/CT": (-5.7, -15.9), "GT/CG": (-5.9, -16.1), 
 338      "TG/AT": (-3.9, -10.5), "TT/AG": (-3.6, -9.8)} 
 339   
 340  # Dangling ends table (DNA) 
 341  # Bommarito et al. (2000), Nucl Acids Res 28: 1929-1934 
 342  DNA_DE1 = { 
 343      "AA/.T": (0.2, 2.3), "AC/.G": (-6.3, -17.1), "AG/.C": (-3.7, -10.0), 
 344      "AT/.A": (-2.9, -7.6), "CA/.T": (0.6, 3.3), "CC/.G": (-4.4, -12.6), 
 345      "CG/.C": (-4.0, -11.9), "CT/.A": (-4.1, -13.0), "GA/.T": (-1.1, -1.6), 
 346      "GC/.G": (-5.1, -14.0), "GG/.C": (-3.9, -10.9), "GT/.A": (-4.2, -15.0), 
 347      "TA/.T": (-6.9, -20.0), "TC/.G": (-4.0, -10.9), "TG/.C": (-4.9, -13.8), 
 348      "TT/.A": (-0.2, -0.5), 
 349      ".A/AT": (-0.7, -0.8), ".C/AG": (-2.1, -3.9), ".G/AC": (-5.9, -16.5), 
 350      ".T/AA": (-0.5, -1.1), ".A/CT": (4.4, 14.9), ".C/CG": (-0.2, -0.1), 
 351      ".G/CC": (-2.6, -7.4), ".T/CA": (4.7, 14.2), ".A/GT": (-1.6, -3.6), 
 352      ".C/GG": (-3.9, -11.2), ".G/GC": (-3.2, -10.4), ".T/GA": (-4.1, -13.1), 
 353      ".A/TT": (2.9, 10.4), ".C/TG": (-4.4, -13.1), ".G/TC": (-5.2, -15.0), 
 354      ".T/TA": (-3.8, -12.6)} 
 355   
 356  # Dangling ends table (RNA) 
 357  # Turner & Mathews (2010), Nucl Acids Res 38: D280-D282 
 358  RNA_DE1 = { 
 359      ".T/AA": (-4.9, -13.2), ".T/CA": (-0.9, -1.3), ".T/GA": (-5.5, -15.1), 
 360      ".T/TA": (-2.3, -5.5), 
 361      ".G/AC": (-9.0, -23.5), ".G/CC": (-4.1, -10.6), ".G/GC": (-8.6, -22.2), 
 362      ".G/TC": (-7.5, -20.31), 
 363      ".C/AG": (-7.4, -20.3), ".C/CG": (-2.8, -7.7), ".C/GG": (-6.4, -16.4), 
 364      ".C/TG": (-3.6, -9.7), 
 365      ".T/AG": (-4.9, -13.2), ".T/CG": (-0.9, -1.3), ".T/GG": (-5.5, -15.1), 
 366      ".T/TG": (-2.3, -5.5), 
 367      ".A/AT": (-5.7, -16.1), ".A/CT": (-0.7, -1.9), ".A/GT": (-5.8, -16.4), 
 368      ".A/TT": (-2.2, -6.8), 
 369      ".G/AT": (-5.7, -16.1), ".G/CT": (-0.7, -1.9), ".G/GT": (-5.8, -16.4), 
 370      ".G/TT": (-2.2, -6.8), 
 371      "AT/.A": (-0.5, -0.6), "CT/.A": (6.9, 22.6), "GT/.A": (0.6, 2.6), 
 372      "TT/.A": (0.6, 2.6), 
 373      "AG/.C": (-1.6, -4.5), "CG/.C": (0.7, 3.2), "GG/.C": (-4.6, -14.8), 
 374      "TG/.C": (-0.4, -1.3), 
 375      "AC/.G": (-2.4, -6.1), "CC/.G": (3.3, 11.6), "GC/.G": (0.8, 3.2), 
 376      "TC/.G": (-1.4, -4.2), 
 377      "AT/.G": (-0.5, -0.6), "CT/.G": (6.9, 22.6), "GT/.G": (0.6, 2.6), 
 378      "TT/.G": (0.6, 2.6), 
 379      "AA/.T": (1.6, 6.1), "CA/.T": (2.2, 8.1), "GA/.T": (0.7, 3.5), 
 380      "TA/.T": (3.1, 10.6), 
 381      "AG/.T": (1.6, 6.1), "CG/.T": (2.2, 8.1), "GG/.T": (0.7, 3.5), 
 382      "TG/.T": (3.1, 10.6)} 
 383   
 384   
385 -def make_table(oldtable=None, values=None):
386 """Return a table with thermodynamic parameters (as dictionary). 387 388 Parameters: 389 oldtable: An existing dictionary with thermodynamic parameters. 390 values: A dictionary with new or updated values. 391 392 E.g., to replace the initiation parameters in the Sugimoto '96 dataset with 393 the initiation parameters from Allawi & SantaLucia '97: 394 395 >>> from Bio.SeqUtils.MeltingTemp import make_table, DNA_NN2 396 >>> table = DNA_NN2 # Sugimoto '96 397 >>> table['init_A/T'] 398 (0, 0) 399 >>> newtable = make_table(oldtable=DNA_NN2, values={'init': (0, 0), 400 ... 'init_A/T': (2.3, 4.1), 401 ... 'init_G/C': (0.1, -2.8)}) 402 >>> print("%0.1f, %0.1f" % newtable['init_A/T']) 403 2.3, 4.1 404 405 """ 406 if oldtable is None: 407 table = {"init": (0, 0), "init_A/T": (0, 0), "init_G/C": (0, 0), 408 "init_oneG/C": (0, 0), "init_allA/T": (0, 0), 409 "init_5T/A": (0, 0), "sym": (0, 0), "AA/TT": (0, 0), 410 "AT/TA": (0, 0), "TA/AT": (0, 0), "CA/GT": (0, 0), 411 "GT/CA": (0, 0), "CT/GA": (0, 0), "GA/CT": (0, 0), 412 "CG/GC": (0, 0), "GC/CG": (0, 0), "GG/CC": (0, 0)} 413 else: 414 table = oldtable.copy() 415 if values: 416 table.update(values) 417 return table
418 419
420 -def _check(seq, method):
421 """Return a sequence which fullfils the requirements of the given method (PRIVATE). 422 423 All Tm methods in this package require the sequence in uppercase format. 424 Most methods make use of the length of the sequence (directly or 425 indirectly), which can only be expressed as len(seq) if the sequence does 426 not contain whitespaces and other non-base characters. RNA sequences are 427 backtranscribed to DNA. This method is PRIVATE. 428 429 Arguments: 430 seq: The sequence as given by the user (passed as string). 431 method: Tm_Wallace, Tm_GC or Tm_NN. 432 433 >>> from Bio.SeqUtils import MeltingTemp as mt 434 >>> mt._check('10 ACGTTGCAAG tccatggtac', 'Tm_NN') 435 'ACGTTGCAAGTCCATGGTAC' 436 437 """ 438 seq = "".join(seq.split()).upper() 439 seq = str(Seq.Seq(seq).back_transcribe()) 440 if method == "Tm_Wallace": 441 return seq 442 if method == "Tm_GC": 443 baseset = ("A", "B", "C", "D", "G", "H", "I", "K", "M", "N", "R", "S", 444 "T", "V", "W", "X", "Y") 445 if method == "Tm_NN": 446 baseset = ("A", "C", "G", "T", "I") 447 seq = "".join([base for base in seq if base in baseset]) 448 return seq
449 450
451 -def salt_correction(Na=0, K=0, Tris=0, Mg=0, dNTPs=0, method=1, seq=None):
452 """Calculate a term to correct Tm for salt ions. 453 454 Depending on the Tm calculation, the term will correct Tm or entropy. To 455 calculate corrected Tm values, different operations need to be applied: 456 457 - methods 1-4: Tm(new) = Tm(old) + corr 458 - method 5: deltaS(new) = deltaS(old) + corr 459 - methods 6+7: Tm(new) = 1/(1/Tm(old) + corr) 460 461 Parameters: 462 - Na, K, Tris, Mg, dNTPS: Millimolar concentration of respective ion. To 463 have a simple 'salt correction', just pass Na. If any of K, Tris, Mg and 464 dNTPS is non-zero, a 'sodium-equivalent' concentration is calculated 465 according to von Ahsen et al. (2001, Clin Chem 47: 1956-1961): 466 [Na_eq] = [Na+] + [K+] + [Tris]/2 + 120*([Mg2+] - [dNTPs])^0.5 467 If [dNTPs] >= [Mg2+]: [Na_eq] = [Na+] + [K+] + [Tris]/2 468 - method: Which method to be applied. Methods 1-4 correct Tm, method 5 469 corrects deltaS, methods 6 and 7 correct 1/Tm. The methods are: 470 471 1. 16.6 x log[Na+] 472 (Schildkraut & Lifson (1965), Biopolymers 3: 195-208) 473 2. 16.6 x log([Na+]/(1.0 + 0.7*[Na+])) 474 (Wetmur (1991), Crit Rev Biochem Mol Biol 126: 227-259) 475 3. 12.5 x log(Na+] 476 (SantaLucia et al. (1996), Biochemistry 35: 3555-3562 477 4. 11.7 x log[Na+] 478 (SantaLucia (1998), Proc Natl Acad Sci USA 95: 1460-1465 479 5. Correction for deltaS: 0.368 x (N-1) x ln[Na+] 480 (SantaLucia (1998), Proc Natl Acad Sci USA 95: 1460-1465) 481 6. (4.29(%GC)-3.95)x1e-5 x ln[Na+] + 9.40e-6 x ln[Na+]^2 482 (Owczarzy et al. (2004), Biochemistry 43: 3537-3554) 483 7. Complex formula with decision tree and 7 empirical constants. 484 Mg2+ is corrected for dNTPs binding (if present) 485 (Owczarzy et al. (2008), Biochemistry 47: 5336-5353) 486 487 Examples 488 -------- 489 >>> from Bio.SeqUtils import MeltingTemp as mt 490 >>> print('%0.2f' % mt.salt_correction(Na=50, method=1)) 491 -21.60 492 >>> print('%0.2f' % mt.salt_correction(Na=50, method=2)) 493 -21.85 494 >>> print('%0.2f' % mt.salt_correction(Na=100, Tris=20, method=2)) 495 -16.45 496 >>> print('%0.2f' % mt.salt_correction(Na=100, Tris=20, Mg=1.5, method=2)) 497 -10.99 498 499 """ 500 if method in (5, 6, 7) and not seq: 501 raise ValueError("sequence is missing (is needed to calculate " + 502 "GC content or sequence length).") 503 if seq: 504 seq = str(seq) 505 corr = 0 506 if not method: 507 return corr 508 Mon = Na + K + Tris / 2.0 # Note: all these values are millimolar 509 mg = Mg * 1e-3 # Lowercase ions (mg, mon, dntps) are molar 510 # Na equivalent according to von Ahsen et al. (2001): 511 if sum((K, Mg, Tris, dNTPs)) > 0 and not method == 7 and dNTPs < Mg: 512 # dNTPs bind Mg2+ strongly. If [dNTPs] is larger or equal than 513 # [Mg2+], free Mg2+ is considered not to be relevant. 514 Mon += 120 * math.sqrt(Mg - dNTPs) 515 mon = Mon * 1e-3 516 # Note: math.log = ln(), math.log10 = log() 517 if method in range(1, 7) and not mon: 518 raise ValueError("Total ion concentration of zero is not allowed in " + 519 "this method.") 520 if method == 1: 521 corr = 16.6 * math.log10(mon) 522 if method == 2: 523 corr = 16.6 * math.log10((mon) / (1.0 + 0.7 * (mon))) 524 if method == 3: 525 corr = 12.5 * math.log10(mon) 526 if method == 4: 527 corr = 11.7 * math.log10(mon) 528 if method == 5: 529 corr = 0.368 * (len(seq) - 1) * math.log(mon) 530 if method == 6: 531 corr = (4.29 * SeqUtils.GC(seq) / 100 - 3.95) * 1e-5 * math.log(mon) +\ 532 9.40e-6 * math.log(mon) ** 2 533 if method == 7: 534 a, b, c, d = 3.92, -0.911, 6.26, 1.42 535 e, f, g = -48.2, 52.5, 8.31 536 if dNTPs > 0: 537 dntps = dNTPs * 1e-3 538 ka = 3e4 # Dissociation constant for Mg:dNTP 539 # Free Mg2+ calculation: 540 mg = (-(ka * dntps - ka * mg + 1.0) + 541 math.sqrt((ka * dntps - ka * mg + 1.0) ** 2 + 542 4.0 * ka * mg)) / (2.0 * ka) 543 if Mon > 0: 544 R = math.sqrt(mg) / mon 545 if R < 0.22: 546 corr = (4.29 * SeqUtils.GC(seq) / 100 - 3.95) * \ 547 1e-5 * math.log(mon) + 9.40e-6 * math.log(mon) ** 2 548 return corr 549 elif R < 6.0: 550 a = 3.92 * (0.843 - 0.352 * math.sqrt(mon) * math.log(mon)) 551 d = 1.42 * (1.279 - 4.03e-3 * math.log(mon) - 552 8.03e-3 * math.log(mon) ** 2) 553 g = 8.31 * (0.486 - 0.258 * math.log(mon) + 554 5.25e-3 * math.log(mon) ** 3) 555 corr = (a + b * math.log(mg) + (SeqUtils.GC(seq) / 100) * 556 (c + d * math.log(mg)) + (1 / (2.0 * (len(seq) - 1))) * 557 (e + f * math.log(mg) + g * math.log(mg) ** 2)) * 1e-5 558 if method > 7: 559 raise ValueError("Allowed values for parameter 'method' are 1-7.") 560 return corr
561 562
563 -def chem_correction(melting_temp, DMSO=0, fmd=0, DMSOfactor=0.75, 564 fmdfactor=0.65, fmdmethod=1, GC=None):
565 """Correct a given Tm for DMSO and formamide. 566 567 Please note that these corrections are +/- rough approximations. 568 569 Arguments: 570 - melting_temp: Melting temperature. 571 - DMSO: Percent DMSO. 572 - fmd: Formamide concentration in %(fmdmethod=1) or molar (fmdmethod=2). 573 - DMSOfactor: How much should Tm decreases per percent DMSO. Default=0.65 574 (von Ahsen et al. 2001). Other published values are 0.5, 0.6 and 0.675. 575 - fmdfactor: How much should Tm decrease per percent formamide. 576 Default=0.65. Several papers report factors between 0.6 and 0.72. 577 - fmdmethod: 578 579 1. Tm = Tm - factor(%formamide) (Default) 580 2. Tm = Tm + (0.453(f(GC)) - 2.88) x [formamide] 581 582 Here f(GC) is fraction of GC. 583 Note (again) that in fmdmethod=1 formamide concentration is given in %, 584 while in fmdmethod=2 it is given in molar. 585 - GC: GC content in percent. 586 587 Examples: 588 >>> from Bio.SeqUtils import MeltingTemp as mt 589 >>> mt.chem_correction(70) 590 70 591 >>> print('%0.2f' % mt.chem_correction(70, DMSO=3)) 592 67.75 593 >>> print('%0.2f' % mt.chem_correction(70, fmd=5)) 594 66.75 595 >>> print('%0.2f' % mt.chem_correction(70, fmdmethod=2, fmd=1.25, 596 ... GC=50)) 597 66.68 598 599 """ 600 if DMSO: 601 melting_temp -= DMSOfactor * DMSO 602 if fmd: 603 # McConaughy et al. (1969), Biochemistry 8: 3289-3295 604 if fmdmethod == 1: 605 # Note: Here fmd is given in percent 606 melting_temp -= fmdfactor * fmd 607 # Blake & Delcourt (1996), Nucl Acids Res 11: 2095-2103 608 if fmdmethod == 2: 609 if GC is None or GC < 0: 610 raise ValueError("'GC' is missing or negative") 611 # Note: Here fmd is given in molar 612 melting_temp += (0.453 * (GC / 100.0) - 2.88) * fmd 613 if fmdmethod not in (1, 2): 614 raise ValueError("'fmdmethod' must be 1 or 2") 615 return melting_temp
616 617
618 -def Tm_Wallace(seq, check=True, strict=True):
619 """Calculate and return the Tm using the 'Wallace rule'. 620 621 Tm = 4 degC * (G + C) + 2 degC * (A+T) 622 623 The Wallace rule (Thein & Wallace 1986, in Human genetic diseases: a 624 practical approach, 33-50) is often used as rule of thumb for approximate 625 Tm calculations for primers of 14 to 20 nt length. 626 627 Non-DNA characters (e.g., E, F, J, !, 1, etc) are ignored by this method. 628 629 Examples: 630 >>> from Bio.SeqUtils import MeltingTemp as mt 631 >>> mt.Tm_Wallace('ACGTTGCAATGCCGTA') 632 48.0 633 >>> mt.Tm_Wallace('ACGT TGCA ATGC CGTA') 634 48.0 635 >>> mt.Tm_Wallace('1ACGT2TGCA3ATGC4CGTA') 636 48.0 637 638 """ 639 seq = str(seq) 640 if check: 641 seq = _check(seq, "Tm_Wallace") 642 643 melting_temp = 2 * (sum(map(seq.count, ("A", "T", "W")))) + \ 644 4 * (sum(map(seq.count, ("C", "G", "S")))) 645 646 # Intermediate values for ambiguous positions: 647 tmp = 3 * (sum(map(seq.count, ("K", "M", "N", "R", "Y")))) + \ 648 10 / 3.0 * (sum(map(seq.count, ("B", "V")))) + \ 649 8 / 3.0 * (sum(map(seq.count, ("D", "H")))) 650 if strict and tmp: 651 raise ValueError("ambiguous bases B, D, H, K, M, N, R, V, Y not " + 652 "allowed when strict=True") 653 else: 654 melting_temp += tmp 655 return melting_temp
656 657
658 -def Tm_GC(seq, check=True, strict=True, valueset=7, userset=None, Na=50, K=0, 659 Tris=0, Mg=0, dNTPs=0, saltcorr=0, mismatch=True):
660 """Return the Tm using empirical formulas based on GC content. 661 662 General format: Tm = A + B(%GC) - C/N + salt correction - D(%mismatch) 663 664 A, B, C, D: empirical constants, N: primer length 665 D (amount of decrease in Tm per % mismatch) is often 1, but sometimes other 666 values have been used (0.6-1.5). Use 'X' to indicate the mismatch position 667 in the sequence. Note that this mismatch correction is a rough estimate. 668 669 >>> from Bio.SeqUtils import MeltingTemp as mt 670 >>> print("%0.2f" % mt.Tm_GC('CTGCTGATXGCACGAGGTTATGG', valueset=2)) 671 69.20 672 673 Arguments: 674 - valueset: A few often cited variants are included: 675 676 1. Tm = 69.3 + 0.41(%GC) - 650/N 677 (Marmur & Doty 1962, J Mol Biol 5: 109-118; Chester & Marshak 1993), 678 Anal Biochem 209: 284-290) 679 2. Tm = 81.5 + 0.41(%GC) - 675/N - %mismatch 680 'QuikChange' formula. Recommended (by the manufacturer) for the 681 design of primers for QuikChange mutagenesis. 682 3. Tm = 81.5 + 0.41(%GC) - 675/N + 16.6 x log[Na+] 683 (Marmur & Doty 1962, J Mol Biol 5: 109-118; Schildkraut & Lifson 684 1965, Biopolymers 3: 195-208) 685 4. Tm = 81.5 + 0.41(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x 686 [Na+])) - %mismatch 687 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). This is the 688 standard formula in approximative mode of MELTING 4.3. 689 5. Tm = 78 + 0.7(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x [Na+])) 690 - %mismatch 691 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). For RNA. 692 6. Tm = 67 + 0.8(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x [Na+])) 693 - %mismatch 694 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). For RNA/DNA 695 hybrids. 696 7. Tm = 81.5 + 0.41(%GC) - 600/N + 16.6 x log[Na+] 697 Used by Primer3Plus to calculate the product Tm. Default set. 698 8. Tm = 77.1 + 0.41(%GC) - 528/N + 11.7 x log[Na+] 699 (von Ahsen et al. 2001, Clin Chem 47: 1956-1961). Recommended 'as a 700 tradeoff between accuracy and ease of use'. 701 702 - userset: Tuple of four values for A, B, C, and D. Usersets override 703 valuesets. 704 - Na, K, Tris, Mg, dNTPs: Concentration of the respective ions [mM]. If 705 any of K, Tris, Mg and dNTPS is non-zero, a 'sodium-equivalent' 706 concentration is calculated and used for salt correction (von Ahsen et 707 al., 2001). 708 - saltcorr: Type of salt correction (see method salt_correction). 709 Default=5. 0 or None means no salt correction. 710 - mismatch: If 'True' (default) every 'X' in the sequence is counted as 711 mismatch. 712 713 """ 714 if saltcorr == 5: 715 raise ValueError("salt-correction method 5 not applicable " 716 "to Tm_GC") 717 seq = str(seq) 718 if check: 719 seq = _check(seq, "Tm_GC") 720 percent_gc = SeqUtils.GC(seq) 721 # Ambiguous bases: add 0.5, 0.67 or 0.33% depending on G+C probability: 722 tmp = sum(map(seq.count, ("K", "M", "N", "R", "Y"))) * 50.0 / len(seq) + \ 723 sum(map(seq.count, ("B", "V"))) * 66.67 / len(seq) + \ 724 sum(map(seq.count, ("D", "H"))) * 33.33 / len(seq) 725 if strict and tmp: 726 raise ValueError("ambiguous bases B, D, H, K, M, N, R, V, Y not " 727 "allowed when 'strict=True'") 728 else: 729 percent_gc += tmp 730 if userset: 731 A, B, C, D = userset 732 else: 733 if valueset == 1: 734 A, B, C, D = (69.3, 0.41, 650, 1) 735 saltcorr = 0 736 if valueset == 2: 737 A, B, C, D = (81.5, 0.41, 675, 1) 738 saltcorr = 0 739 if valueset == 3: 740 A, B, C, D = (81.5, 0.41, 675, 1) 741 saltcorr = 2 742 if valueset == 4: 743 A, B, C, D = (81.5, 0.41, 500, 1) 744 saltcorr = 3 745 if valueset == 5: 746 A, B, C, D = (78.0, 0.7, 500, 1) 747 saltcorr = 3 748 if valueset == 6: 749 A, B, C, D = (67.0, 0.8, 500, 1) 750 saltcorr = 3 751 if valueset == 7: 752 A, B, C, D = (81.5, 0.41, 600, 1) 753 saltcorr = 2 754 if valueset == 8: 755 A, B, C, D = (77.1, 0.41, 528, 1) 756 saltcorr = 4 757 if valueset > 8: 758 raise ValueError("allowed values for parameter 'valueset' are 0-8.") 759 760 melting_temp = A + B * percent_gc - C / (len(seq) * 1.0) 761 if saltcorr: 762 melting_temp += salt_correction(Na=Na, K=K, Tris=Tris, Mg=Mg, 763 dNTPs=dNTPs, seq=seq, method=saltcorr) 764 if mismatch: 765 melting_temp -= D * (seq.count("X") * 100.0 / len(seq)) 766 return melting_temp
767 768
769 -def _key_error(neighbors, strict):
770 """Throw an error or a warning if there is no data for the neighbors (PRIVATE).""" 771 # We haven't found the key in the tables 772 if strict: 773 raise ValueError("no thermodynamic data for neighbors %r available" 774 % neighbors) 775 else: 776 warnings.warn("no themodynamic data for neighbors %r available. " 777 "Calculation will be wrong" % neighbors, 778 BiopythonWarning)
779 780
781 -def Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, 782 nn_table=None, tmm_table=None, imm_table=None, de_table=None, 783 dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, 784 dNTPs=0, saltcorr=5):
785 """Return the Tm using nearest neighbor thermodynamics. 786 787 Arguments: 788 - seq: The primer/probe sequence as string or Biopython sequence object. 789 For RNA/DNA hybridizations seq must be the RNA sequence. 790 - c_seq: Complementary sequence. The sequence of the template/target in 791 3'->5' direction. c_seq is necessary for mismatch correction and 792 dangling-ends correction. Both corrections will automatically be 793 applied if mismatches or dangling ends are present. Default=None. 794 - shift: Shift of the primer/probe sequence on the template/target 795 sequence, e.g.:: 796 797 shift=0 shift=1 shift= -1 798 Primer (seq): 5' ATGC... 5' ATGC... 5' ATGC... 799 Template (c_seq): 3' TACG... 3' CTACG... 3' ACG... 800 801 The shift parameter is necessary to align seq and c_seq if they have 802 different lengths or if they should have dangling ends. Default=0 803 - table: Thermodynamic NN values, eight tables are implemented: 804 For DNA/DNA hybridizations: 805 806 - DNA_NN1: values from Breslauer et al. (1986) 807 - DNA_NN2: values from Sugimoto et al. (1996) 808 - DNA_NN3: values from Allawi & SantaLucia (1997) (default) 809 - DNA_NN4: values from SantaLucia & Hicks (2004) 810 811 For RNA/RNA hybridizations: 812 813 - RNA_NN1: values from Freier et al. (1986) 814 - RNA_NN2: values from Xia et al. (1998) 815 - RNA_NN3: valuse from Chen et al. (2012) 816 817 For RNA/DNA hybridizations: 818 819 - R_DNA_NN1: values from Sugimoto et al. (1995) 820 Note that ``seq`` must be the RNA sequence. 821 822 Use the module's maketable method to make a new table or to update one 823 one of the implemented tables. 824 - tmm_table: Thermodynamic values for terminal mismatches. 825 Default: DNA_TMM1 (SantaLucia & Peyret, 2001) 826 - imm_table: Thermodynamic values for internal mismatches, may include 827 insosine mismatches. Default: DNA_IMM1 (Allawi & SantaLucia, 1997-1998; 828 Peyret et al., 1999; Watkins & SantaLucia, 2005) 829 - de_table: Thermodynamic values for dangling ends: 830 831 - DNA_DE1: for DNA. Values from Bommarito et al. (2000) (default) 832 - RNA_DE1: for RNA. Values from Turner & Mathews (2010) 833 834 - dnac1: Concentration of the higher concentrated strand [nM]. Typically 835 this will be the primer (for PCR) or the probe. Default=25. 836 - dnac2: Concentration of the lower concentrated strand [nM]. In PCR this 837 is the template strand which concentration is typically very low and may 838 be ignored (dnac2=0). In oligo/oligo hybridization experiments, dnac1 839 equals dnac1. Default=25. 840 MELTING and Primer3Plus use k = [Oligo(Total)]/4 by default. To mimic 841 this behaviour, you have to divide [Oligo(Total)] by 2 and assign this 842 concentration to dnac1 and dnac2. E.g., Total oligo concentration of 843 50 nM in Primer3Plus means dnac1=25, dnac2=25. 844 - selfcomp: Is the sequence self-complementary? Default=False. If 'True' 845 the primer is thought binding to itself, thus dnac2 is not considered. 846 - Na, K, Tris, Mg, dNTPs: See method 'Tm_GC' for details. Defaults: Na=50, 847 K=0, Tris=0, Mg=0, dNTPs=0. 848 - saltcorr: See method 'Tm_GC'. Default=5. 0 means no salt correction. 849 850 """ 851 # Set defaults 852 if not nn_table: 853 nn_table = DNA_NN3 854 if not tmm_table: 855 tmm_table = DNA_TMM1 856 if not imm_table: 857 imm_table = DNA_IMM1 858 if not de_table: 859 de_table = DNA_DE1 860 861 seq = str(seq) 862 if not c_seq: 863 # c_seq must be provided by user if dangling ends or mismatches should 864 # be taken into account. Otherwise take perfect complement. 865 c_seq = Seq.Seq(seq).complement() 866 c_seq = str(c_seq) 867 if check: 868 seq = _check(seq, "Tm_NN") 869 c_seq = _check(c_seq, "Tm_NN") 870 tmp_seq = seq 871 tmp_cseq = c_seq 872 delta_h = 0 873 delta_s = 0 874 d_h = 0 # Names for indexes 875 d_s = 1 # 0 and 1 876 877 # Dangling ends? 878 if shift or len(seq) != len(c_seq): 879 # Align both sequences using the shift parameter 880 if shift > 0: 881 tmp_seq = "." * shift + seq 882 if shift < 0: 883 tmp_cseq = "." * abs(shift) + c_seq 884 if len(tmp_cseq) > len(tmp_seq): 885 tmp_seq += (len(tmp_cseq) - len(tmp_seq)) * "." 886 if len(tmp_cseq) < len(tmp_seq): 887 tmp_cseq += (len(tmp_seq) - len(tmp_cseq)) * "." 888 # Remove 'over-dangling' ends 889 while tmp_seq.startswith("..") or tmp_cseq.startswith(".."): 890 tmp_seq = tmp_seq[1:] 891 tmp_cseq = tmp_cseq[1:] 892 while tmp_seq.endswith("..") or tmp_cseq.endswith(".."): 893 tmp_seq = tmp_seq[:-1] 894 tmp_cseq = tmp_cseq[:-1] 895 # Now for the dangling ends 896 if tmp_seq.startswith(".") or tmp_cseq.startswith("."): 897 left_de = tmp_seq[:2] + "/" + tmp_cseq[:2] 898 try: 899 delta_h += de_table[left_de][d_h] 900 delta_s += de_table[left_de][d_s] 901 except KeyError: 902 _key_error(left_de, strict) 903 tmp_seq = tmp_seq[1:] 904 tmp_cseq = tmp_cseq[1:] 905 if tmp_seq.endswith(".") or tmp_cseq.endswith("."): 906 right_de = tmp_cseq[-2:][::-1] + "/" + tmp_seq[-2:][::-1] 907 try: 908 delta_h += de_table[right_de][d_h] 909 delta_s += de_table[right_de][d_s] 910 except KeyError: 911 _key_error(right_de, strict) 912 tmp_seq = tmp_seq[:-1] 913 tmp_cseq = tmp_cseq[:-1] 914 915 # Now for terminal mismatches 916 left_tmm = tmp_cseq[:2][::-1] + "/" + tmp_seq[:2][::-1] 917 if left_tmm in tmm_table: 918 delta_h += tmm_table[left_tmm][d_h] 919 delta_s += tmm_table[left_tmm][d_s] 920 tmp_seq = tmp_seq[1:] 921 tmp_cseq = tmp_cseq[1:] 922 right_tmm = tmp_seq[-2:] + "/" + tmp_cseq[-2:] 923 if right_tmm in tmm_table: 924 delta_h += tmm_table[right_tmm][d_h] 925 delta_s += tmm_table[right_tmm][d_s] 926 tmp_seq = tmp_seq[:-1] 927 tmp_cseq = tmp_cseq[:-1] 928 929 # Now everything 'unusual' at the ends is handled and removed and we can 930 # look at the initiation. 931 # One or several of the following initiation types may apply: 932 933 # Type: General initiation value 934 delta_h += nn_table["init"][d_h] 935 delta_s += nn_table["init"][d_s] 936 937 # Type: Duplex with no (allA/T) or at least one (oneG/C) GC pair 938 if SeqUtils.GC(seq) == 0: 939 delta_h += nn_table["init_allA/T"][d_h] 940 delta_s += nn_table["init_allA/T"][d_s] 941 else: 942 delta_h += nn_table["init_oneG/C"][d_h] 943 delta_s += nn_table["init_oneG/C"][d_s] 944 945 # Type: Penalty if 5' end is T 946 if seq.startswith("T"): 947 delta_h += nn_table["init_5T/A"][d_h] 948 delta_s += nn_table["init_5T/A"][d_s] 949 if seq.endswith("A"): 950 delta_h += nn_table["init_5T/A"][d_h] 951 delta_s += nn_table["init_5T/A"][d_s] 952 953 # Type: Different values for G/C or A/T terminal basepairs 954 ends = seq[0] + seq[-1] 955 AT = ends.count("A") + ends.count("T") 956 GC = ends.count("G") + ends.count("C") 957 delta_h += nn_table["init_A/T"][d_h] * AT 958 delta_s += nn_table["init_A/T"][d_s] * AT 959 delta_h += nn_table["init_G/C"][d_h] * GC 960 delta_s += nn_table["init_G/C"][d_s] * GC 961 962 # Finally, the 'zipping' 963 for basenumber in range(len(tmp_seq) - 1): 964 neighbors = tmp_seq[basenumber:basenumber + 2] + "/" + \ 965 tmp_cseq[basenumber:basenumber + 2] 966 if neighbors in imm_table: 967 delta_h += imm_table[neighbors][d_h] 968 delta_s += imm_table[neighbors][d_s] 969 elif neighbors[::-1] in imm_table: 970 delta_h += imm_table[neighbors[::-1]][d_h] 971 delta_s += imm_table[neighbors[::-1]][d_s] 972 elif neighbors in nn_table: 973 delta_h += nn_table[neighbors][d_h] 974 delta_s += nn_table[neighbors][d_s] 975 elif neighbors[::-1] in nn_table: 976 delta_h += nn_table[neighbors[::-1]][d_h] 977 delta_s += nn_table[neighbors[::-1]][d_s] 978 else: 979 # We haven't found the key... 980 _key_error(neighbors, strict) 981 982 k = (dnac1 - (dnac2 / 2.0)) * 1e-9 983 if selfcomp: 984 k = dnac1 * 1e-9 985 delta_h += nn_table["sym"][d_h] 986 delta_s += nn_table["sym"][d_s] 987 R = 1.987 # universal gas constant in Cal/degrees C*Mol 988 if saltcorr: 989 corr = salt_correction(Na=Na, K=K, Tris=Tris, Mg=Mg, dNTPs=dNTPs, 990 method=saltcorr, seq=seq) 991 if saltcorr == 5: 992 delta_s += corr 993 melting_temp = (1000 * delta_h) / (delta_s + (R * (math.log(k)))) - 273.15 994 if saltcorr in (1, 2, 3, 4): 995 melting_temp += corr 996 if saltcorr in (6, 7): 997 # Tm = 1/(1/Tm + corr) 998 melting_temp = (1 / (1 / (melting_temp + 273.15) + corr) - 273.15) 999 1000 return melting_temp
1001 1002
1003 -def Tm_staluc(s, dnac=50, saltc=50, rna=0):
1004 """Return DNA/DNA Tm using nearest neighbor thermodynamics (OBSOLETE). 1005 1006 This method may be depreceated in the future. Use Tm_NN instead. Tm_NN 1007 with default values gives the same result as Tm_staluc. 1008 1009 s is the sequence as string or Seq object 1010 dnac is DNA concentration [nM] 1011 saltc is salt concentration [mM]. 1012 rna=0 is for DNA/DNA (default), use 1 for RNA/RNA hybridisation. 1013 1014 For DNA/DNA, see Allawi & SantaLucia (1997), Biochemistry 36: 10581-10594 1015 For RNA/RNA, see Xia et al (1998), Biochemistry 37: 14719-14735 1016 1017 Examples 1018 -------- 1019 >>> print("%0.2f" % Tm_staluc('CAGTCAGTACGTACGTGTACTGCCGTA')) 1020 59.87 1021 >>> print("%0.2f" % Tm_staluc('CAGTCAGTACGTACGTGTACTGCCGTA', rna=True)) 1022 77.90 1023 1024 You can also use a Seq object instead of a string, 1025 1026 >>> from Bio.Seq import Seq 1027 >>> from Bio.Alphabet import generic_nucleotide 1028 >>> s = Seq('CAGTCAGTACGTACGTGTACTGCCGTA', generic_nucleotide) 1029 >>> print("%0.2f" % Tm_staluc(s)) 1030 59.87 1031 >>> print("%0.2f" % Tm_staluc(s, rna=True)) 1032 77.90 1033 1034 """ 1035 # Original method was by Sebastian Bassi <sbassi@genesdigitales.com>. It is 1036 # now superseded by Tm_NN. 1037 1038 warnings.warn("Tm_staluc may be depreciated in the future. Use Tm_NN " + 1039 "instead.", PendingDeprecationWarning) 1040 if not rna: 1041 return Tm_NN(s, dnac1=dnac / 2.0, dnac2=dnac / 2.0, Na=saltc) 1042 elif rna == 1: 1043 return Tm_NN(s, dnac1=dnac / 2.0, dnac2=dnac / 2.0, Na=saltc, 1044 nn_table=RNA_NN2) 1045 else: 1046 raise ValueError("rna={0} not supported".format(rna))
1047 1048 1049 if __name__ == "__main__": 1050 from Bio._utils import run_doctest 1051 run_doctest() 1052