Package Bio :: Package SeqUtils :: Module MeltingTemp
[hide private]
[frames] | no frames]

Source Code for Module Bio.SeqUtils.MeltingTemp

   1  # Copyright 2004-2008 by Sebastian Bassi. 
   2  # Copyright 2013 by Markus Piotrowski. 
   3  # All rights reserved. 
   4  # This code is part of the Biopython distribution and governed by its 
   5  # license.  Please see the LICENSE file that should have been included 
   6  # as part of this package. 
   7   
   8  """Calculate the melting temperature of nucleotide sequences. 
   9   
  10  This module contains three different methods to calculate the melting 
  11  temperature of oligonucleotides: 
  12   
  13  1. Tm_Wallace: 'Rule of thumb' 
  14  2. Tm_GC: Empirical formulas based on GC content. Salt and mismatch corrections 
  15     can be included. 
  16  3. Tm_NN: Calculation based on nearest neighbor thermodynamics. Several tables 
  17     for DNA/DNA, DNA/RNA and RNA/RNA hybridizations are included. 
  18     Correction for mismatches, dangling ends, salt concentration and other 
  19     additives are available. 
  20   
  21  Tm_staluc is the 'old' NN calculation and is kept for compatibility. It is, 
  22  however, recommended to use Tm_NN instead, since Tm_staluc may be depreceated 
  23  in the future. Also, Tm_NN has much more options. Using Tm_staluc and Tm_NN 
  24  with default parameters gives (essentially) the same results. 
  25   
  26  General parameters for most Tm methods: 
  27   - seq -- A Biopython sequence object or a string. 
  28   - check -- Checks if the sequence is valid for the given method (default= 
  29     True). In general, whitespaces and non-base characters are removed and 
  30     characters are converted to uppercase. RNA will be backtranscribed. 
  31   - strict -- Do not allow base characters or neighbor duplex keys (e.g. 
  32     'AT/NA') that could not or not unambigiously be evaluated for the respective 
  33     method (default=True). Note that W (= A or T) and S (= C or G) are not 
  34     ambiguous for Tm_Wallace and Tm_GC. If 'False', average values (if 
  35     applicable) will be used. 
  36   
  37  This module is not able to detect self-complementary and it will not use 
  38  alignment tools to align an oligonucleotide sequence to its target sequence. 
  39  Thus it can not detect dangling-ends and mismatches by itself (don't even think 
  40  about bulbs and loops). These parameters have to be handed over to the 
  41  respective method. 
  42   
  43  Other public methods of this module: 
  44   - make_table     : To create a table with thermodynamic data. 
  45   - salt_correction: To adjust Tm to a given salt concentration by different 
  46     formulas. This method is called from Tm_GC and Tm_NN but may 
  47     also be accessed 'manually'. It returns a correction term, not 
  48     a corrected Tm! 
  49   - chem_correction: To adjust Tm regarding the chemical additives DMSO and 
  50     formaldehyde. The method returns a corrected Tm. Chemical 
  51     correction is not an integral part of the Tm methods and must 
  52     be called additionally. 
  53   
  54  Examples: 
  55   
  56      >>> from Bio.SeqUtils import MeltingTemp as mt 
  57      >>> from Bio.Seq import Seq 
  58      >>> mystring = 'CGTTCCAAAGATGTGGGCATGAGCTTAC' 
  59      >>> myseq = Seq(mystring) 
  60      >>> print('%0.2f' % mt.Tm_Wallace(mystring)) 
  61      84.00 
  62      >>> print('%0.2f' % mt.Tm_Wallace(myseq)) 
  63      84.00 
  64      >>> print('%0.2f' % mt.Tm_GC(myseq)) 
  65      58.73 
  66      >>> print('%0.2f' % mt.Tm_NN(myseq)) 
  67      60.32 
  68   
  69  Tm_NN with default values gives same result as 'old' Tm_staluc. However, values 
  70  differ for RNA, since Tm_staluc had some errors for RNA calculation. These 
  71  errors have been fixed in this version. 
  72   
  73  New Tm_NN can do slightly more: 
  74  Using different thermodynamic tables, e.g. from Breslauer '86 or Sugimoto '96: 
  75   
  76      >>> print('%0.2f' % mt.Tm_NN(myseq, nn_table=mt.DNA_NN1))  # Breslauer '86 
  77      72.19 
  78      >>> print('%0.2f' % mt.Tm_NN(myseq, nn_table=mt.DNA_NN2))  # Sugimoto '96 
  79      65.47 
  80   
  81  Several types of salc correction (for Tm_NN and Tm_GC): 
  82   
  83      >>> for i in range(1, 8): 
  84      ...     print("Type: %d, Tm: %0.2f" % (i, Tm_NN(myseq, saltcorr=i))) 
  85      ... 
  86      Type: 1, Tm: 54.27 
  87      Type: 2, Tm: 54.02 
  88      Type: 3, Tm: 59.60 
  89      Type: 4, Tm: 60.64 
  90      Type: 5, Tm: 60.32 
  91      Type: 6, Tm: 59.78 
  92      Type: 7, Tm: 59.78 
  93   
  94  Correction for other monovalent cations (K+, Tris), Mg2+ and dNTPs according 
  95  to von Ahsen et al. (2001) or Owczarzy et al. (2008) (for Tm_NN and Tm_GC): 
  96   
  97      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10)) 
  98      60.79 
  99      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5)) 
 100      67.39 
 101      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5, saltcorr=7)) 
 102      66.81 
 103      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5, dNTPs=0.6, 
 104      ...                          saltcorr=7)) 
 105      66.04 
 106   
 107  Dangling ends and mismatches, e.g.:: 
 108   
 109      Oligo:     CGTTCCaAAGATGTGGGCATGAGCTTAC       CGTTCCaAAGATGTGGGCATGAGCTTAC 
 110                 ::::::X:::::::::::::::::::::  or   ::::::X::::::::::::::::::::: 
 111      Template:  GCAAGGcTTCTACACCCGTACTCGAATG      TGCAAGGcTTCTACACCCGTACTCGAATGC 
 112   
 113  Here: 
 114   
 115      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC')) 
 116      60.32 
 117      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC', 
 118      ...                    c_seq='GCAAGGcTTCTACACCCGTACTCGAATG')) 
 119      55.39 
 120      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC', shift=1, 
 121      ...                   c_seq='TGCAAGGcTTCTACACCCGTACTCGAATGC')) 
 122      55.69 
 123   
 124  The same for RNA: 
 125   
 126      >>> print('%0.2f' % mt.Tm_NN('CGUUCCAAAGAUGUGGGCAUGAGCUUAC', 
 127      ...                   c_seq='UGCAAGGcUUCUACACCCGUACUCGAAUGC', 
 128      ...                   shift=1, nn_table=mt.RNA_NN3, 
 129      ...                   de_table=mt.RNA_DE1)) 
 130      73.00 
 131   
 132  Note, that thermodynamic data are not available for all kind of mismatches, 
 133  e.g. most double mismatches or terminal mismaches combined with danglind ends: 
 134   
 135      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC', 
 136      ...                   c_seq='TtCAAGGcTTCTACACCCGTACTCGAATGC', 
 137      ...                   shift=1)) 
 138      Traceback (most recent call last): 
 139      ValueError: no thermodynamic data for neighbors '.C/TT' available 
 140   
 141  Make your own tables, or update/extend existing tables. E.g., add values for 
 142  locked nucleotides. Here, 'locked A' (and its complement) should be represented 
 143  by '1': 
 144   
 145      >>> mytable = mt.make_table(oldtable=mt.DNA_NN3, 
 146      ...                         values={'A1/T1':(-6.608, -17.235), 
 147      ...                         '1A/1T':(-6.893, -15.923)}) 
 148      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC')) 
 149      60.32 
 150      >>> print('%0.2f' % mt.Tm_NN('CGTTCCA1AGATGTGGGCATGAGCTTAC', 
 151      ...                           nn_table=mytable, check=False)) 
 152      ... # 'check' must be False, otherwise '1' would be discarded 
 153      62.53 
 154   
 155  """ 
 156   
 157  from __future__ import print_function 
 158   
 159  import math 
 160  import warnings 
 161   
 162  from Bio import SeqUtils, Seq 
 163  from Bio import BiopythonWarning 
 164   
 165   
 166  # Thermodynamic lookup tables (dictionaries): 
 167  # Enthalpy (dH) and entropy (dS) values for nearest neighbors and initiation 
 168  # process. Calculation of duplex initiation is quite different in several 
 169  # papers; to allow for a general calculation, all different initiation 
 170  # parameters are included in all tables and non-applicable parameters are set 
 171  # to zero. 
 172  # The key is either an initiation type (e.g., 'init_A/T') or a nearest neighbor 
 173  # duplex sequence (e.g., GT/CA, to read 5'GT3'-3'CA5'). The values are tuples 
 174  # of dH (kcal/mol), dS (cal/mol K). 
 175   
 176  # DNA/DNA 
 177  # Breslauer et al. (1986), Proc Natl Acad Sci USA 83: 3746-3750 
 178  DNA_NN1 = { 
 179      'init': (0, 0), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 180      'init_oneG/C': (0, -16.8), 'init_allA/T': (0, -20.1), 'init_5T/A': (0, 0), 
 181      'sym': (0, -1.3), 
 182      'AA/TT': (-9.1, -24.0), 'AT/TA': (-8.6, -23.9), 'TA/AT': (-6.0, -16.9), 
 183      'CA/GT': (-5.8, -12.9), 'GT/CA': (-6.5, -17.3), 'CT/GA': (-7.8, -20.8), 
 184      'GA/CT': (-5.6, -13.5), 'CG/GC': (-11.9, -27.8), 'GC/CG': (-11.1, -26.7), 
 185      'GG/CC': (-11.0, -26.6)} 
 186   
 187  # Sugimoto et al. (1996), Nuc Acids Res 24 : 4501-4505 
 188  DNA_NN2 = { 
 189      'init': (0.6, -9.0), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 190      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 191      'sym': (0, -1.4), 
 192      'AA/TT': (-8.0, -21.9), 'AT/TA': (-5.6, -15.2), 'TA/AT': (-6.6, -18.4), 
 193      'CA/GT': (-8.2, -21.0), 'GT/CA': (-9.4, -25.5), 'CT/GA': (-6.6, -16.4), 
 194      'GA/CT': (-8.8, -23.5), 'CG/GC': (-11.8, -29.0), 'GC/CG': (-10.5, -26.4), 
 195      'GG/CC': (-10.9, -28.4)} 
 196   
 197  # Allawi and SantaLucia (1997), Biochemistry 36: 10581-10594 
 198  DNA_NN3 = { 
 199      'init': (0, 0), 'init_A/T': (2.3, 4.1), 'init_G/C': (0.1, -2.8), 
 200      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 201      'sym': (0, -1.4), 
 202      'AA/TT': (-7.9, -22.2), 'AT/TA': (-7.2, -20.4), 'TA/AT': (-7.2, -21.3), 
 203      'CA/GT': (-8.5, -22.7), 'GT/CA': (-8.4, -22.4), 'CT/GA': (-7.8, -21.0), 
 204      'GA/CT': (-8.2, -22.2), 'CG/GC': (-10.6, -27.2), 'GC/CG': (-9.8, -24.4), 
 205      'GG/CC': (-8.0, -19.9)} 
 206   
 207  # SantaLucia & Hicks (2004), Annu. Rev. Biophys. Biomol. Struct 33: 415-440 
 208  DNA_NN4 = { 
 209      'init': (0.2, -5.7), 'init_A/T': (2.2, 6.9), 'init_G/C': (0, 0), 
 210      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 211      'sym': (0, -1.4), 
 212      'AA/TT': (-7.6, -21.3), 'AT/TA': (-7.2, -20.4), 'TA/AT': (-7.2, -20.4), 
 213      'CA/GT': (-8.5, -22.7), 'GT/CA': (-8.4, -22.4), 'CT/GA': (-7.8, -21.0), 
 214      'GA/CT': (-8.2, -22.2), 'CG/GC': (-10.6, -27.2), 'GC/CG': (-9.8, -24.4), 
 215      'GG/CC': (-8.0, -19.0)} 
 216   
 217  # RNA/RNA 
 218  # Freier et al. (1986), Proc Natl Acad Sci USA 83: 9373-9377 
 219  RNA_NN1 = { 
 220      'init': (0, -10.8), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 221      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 222      'sym': (0, -1.4), 
 223      'AA/TT': (-6.6, -18.4), 'AT/TA': (-5.7, -15.5), 'TA/AT': (-8.1, -22.6), 
 224      'CA/GT': (-10.5, -27.8), 'GT/CA': (-10.2, -26.2), 'CT/GA': (-7.6, -19.2), 
 225      'GA/CT': (-13.3, -35.5), 'CG/GC': (-8.0, -19.4), 'GC/CG': (-14.2, -34.9), 
 226      'GG/CC': (-12.2, -29.7)} 
 227   
 228  # Xia et al (1998), Biochemistry 37: 14719-14735 
 229  RNA_NN2 = { 
 230      'init': (3.61, -1.5), 'init_A/T': (3.72, 10.5), 'init_G/C': (0, 0), 
 231      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 232      'sym': (0, -1.4), 
 233      'AA/TT': (-6.82, -19.0), 'AT/TA': (-9.38, -26.7), 'TA/AT': (-7.69, -20.5), 
 234      'CA/GT': (-10.44, -26.9), 'GT/CA': (-11.40, -29.5), 
 235      'CT/GA': (-10.48, -27.1), 'GA/CT': (-12.44, -32.5), 
 236      'CG/GC': (-10.64, -26.7), 'GC/CG': (-14.88, -36.9), 
 237      'GG/CC': (-13.39, -32.7)} 
 238   
 239  # Chen et al. (2012), Biochemistry 51: 3508-3522 
 240  RNA_NN3 = { 
 241      'init': (6.40, 6.99), 'init_A/T': (3.85, 11.04), 'init_G/C': (0, 0), 
 242      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 243      'sym': (0, -1.4), 
 244      'AA/TT': (-7.09, -19.8), 'AT/TA': (-9.11, -25.8), 'TA/AT': (-8.50, -22.9), 
 245      'CA/GT': (-11.03, -28.8), 'GT/CA': (-11.98, -31.3), 
 246      'CT/GA': (-10.90, -28.5), 'GA/CT': (-13.21, -34.9), 
 247      'CG/GC': (-10.88, -27.4), 'GC/CG': (-16.04, -40.6), 
 248      'GG/CC': (-14.18, -35.0), 'GT/TG': (-13.83, -46.9), 
 249      'GG/TT': (-17.82, -56.7), 'AG/TT': (-3.96, -11.6), 
 250      'TG/AT': (-0.96, -1.8), 'TT/AG': (-10.38, -31.8), 'TG/GT': (-12.64, -38.9), 
 251      'AT/TG': (-7.39, -21.0), 'CG/GT': (-5.56, -13.9), 'CT/GG': (-9.44, -24.7), 
 252      'GG/CT': (-7.03, -16.8), 'GT/CG': (-11.09, -28.8)} 
 253   
 254  # RNA/DNA 
 255  # Sugimoto et al. (1995), Biochemistry 34: 11211-11216 
 256  R_DNA_NN1 = { 
 257      'init': (1.9, -3.9), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 258      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 259      'sym': (0, 0), 
 260      'AA': (-11.5, -36.4), 'AC': (-7.8, -21.6), 'AG': (-7.0, -19.7), 
 261      'AT': (-8.3, -23.9), 'CA': (-10.4, -28.4), 'CC': (-12.8, -31.9), 
 262      'CG': (-16.3, -47.1), 'CT': (-9.1, -23.5), 'GA': (-8.6, -22.9), 
 263      'GC': (-8.0, -17.1), 'GG': (-9.3, -23.2), 'GT': (-5.9, -12.3), 
 264      'TA': (-7.8, -23.2), 'TC': (-5.5, -13.5), 'TG': (-9.0, -26.1), 
 265      'TT': (-7.8, -21.9)} 
 266   
 267  # Internal mismatch and inosine table (DNA) 
 268  # Allawi & SantaLucia (1997), Biochemistry 36: 10581-10594 
 269  # Allawi & SantaLucia (1998), Biochemistry 37: 9435-9444 
 270  # Allawi & SantaLucia (1998), Biochemistry 37: 2170-2179 
 271  # Allawi & SantaLucia (1998), Nucl Acids Res 26: 2694-2701 
 272  # Peyret et al. (1999), Biochemistry 38: 3468-3477 
 273  # Watkins & SantaLucia (2005), Nucl Acids Res 33: 6258-6267 
 274  DNA_IMM1 = { 
 275      'AG/TT': (1.0, 0.9), 'AT/TG': (-2.5, -8.3), 'CG/GT': (-4.1, -11.7), 
 276      'CT/GG': (-2.8, -8.0), 'GG/CT': (3.3, 10.4), 'GG/TT': (5.8, 16.3), 
 277      'GT/CG': (-4.4, -12.3), 'GT/TG': (4.1, 9.5), 'TG/AT': (-0.1, -1.7), 
 278      'TG/GT': (-1.4, -6.2), 'TT/AG': (-1.3, -5.3), 'AA/TG': (-0.6, -2.3), 
 279      'AG/TA': (-0.7, -2.3), 'CA/GG': (-0.7, -2.3), 'CG/GA': (-4.0, -13.2), 
 280      'GA/CG': (-0.6, -1.0), 'GG/CA': (0.5, 3.2), 'TA/AG': (0.7, 0.7), 
 281      'TG/AA': (3.0, 7.4), 
 282      'AC/TT': (0.7, 0.2), 'AT/TC': (-1.2, -6.2), 'CC/GT': (-0.8, -4.5), 
 283      'CT/GC': (-1.5, -6.1), 'GC/CT': (2.3, 5.4), 'GT/CC': (5.2, 13.5), 
 284      'TC/AT': (1.2, 0.7), 'TT/AC': (1.0, 0.7), 
 285      'AA/TC': (2.3, 4.6), 'AC/TA': (5.3, 14.6), 'CA/GC': (1.9, 3.7), 
 286      'CC/GA': (0.6, -0.6), 'GA/CC': (5.2, 14.2), 'GC/CA': (-0.7, -3.8), 
 287      'TA/AC': (3.4, 8.0), 'TC/AA': (7.6, 20.2), 
 288      'AA/TA': (1.2, 1.7), 'CA/GA': (-0.9, -4.2), 'GA/CA': (-2.9, -9.8), 
 289      'TA/AA': (4.7, 12.9), 'AC/TC': (0.0, -4.4), 'CC/GC': (-1.5, -7.2), 
 290      'GC/CC': (3.6, 8.9), 'TC/AC': (6.1, 16.4), 'AG/TG': (-3.1, -9.5), 
 291      'CG/GG': (-4.9, -15.3), 'GG/CG': (-6.0, -15.8), 'TG/AG': (1.6, 3.6), 
 292      'AT/TT': (-2.7, -10.8), 'CT/GT': (-5.0, -15.8), 'GT/CT': (-2.2, -8.4), 
 293      'TT/AT': (0.2, -1.5), 
 294      'AI/TC': (-8.9, -25.5), 'TI/AC': (-5.9, -17.4), 'AC/TI': (-8.8, -25.4), 
 295      'TC/AI': (-4.9, -13.9), 'CI/GC': (-5.4, -13.7), 'GI/CC': (-6.8, -19.1), 
 296      'CC/GI': (-8.3, -23.8), 'GC/CI': (-5.0, -12.6), 
 297      'AI/TA': (-8.3, -25.0), 'TI/AA': (-3.4, -11.2), 'AA/TI': (-0.7, -2.6), 
 298      'TA/AI': (-1.3, -4.6), 'CI/GA': (2.6, 8.9), 'GI/CA': (-7.8, -21.1), 
 299      'CA/GI': (-7.0, -20.0), 'GA/CI': (-7.6, -20.2), 
 300      'AI/TT': (0.49, -0.7), 'TI/AT': (-6.5, -22.0), 'AT/TI': (-5.6, -18.7), 
 301      'TT/AI': (-0.8, -4.3), 'CI/GT': (-1.0, -2.4), 'GI/CT': (-3.5, -10.6), 
 302      'CT/GI': (0.1, -1.0), 'GT/CI': (-4.3, -12.1), 
 303      'AI/TG': (-4.9, -15.8), 'TI/AG': (-1.9, -8.5), 'AG/TI': (0.1, -1.8), 
 304      'TG/AI': (1.0, 1.0), 'CI/GG': (7.1, 21.3), 'GI/CG': (-1.1, -3.2), 
 305      'CG/GI': (5.8, 16.9), 'GG/CI': (-7.6, -22.0), 
 306      'AI/TI': (-3.3, -11.9), 'TI/AI': (0.1, -2.3), 'CI/GI': (1.3, 3.0), 
 307      'GI/CI': (-0.5, -1.3)} 
 308   
 309  # Terminal mismatch table (DNA) 
 310  # SantaLucia & Peyret (2001) Patent Application WO 01/94611 
 311  DNA_TMM1 = { 
 312      'AA/TA': (-3.1, -7.8), 'TA/AA': (-2.5, -6.3), 'CA/GA': (-4.3, -10.7), 
 313      'GA/CA': (-8.0, -22.5), 
 314      'AC/TC': (-0.1, 0.5), 'TC/AC': (-0.7, -1.3), ' CC/GC': (-2.1, -5.1), 
 315      'GC/CC': (-3.9, -10.6), 
 316      'AG/TG': (-1.1, -2.1), 'TG/AG': (-1.1, -2.7), 'CG/GG': (-3.8, -9.5), 
 317      'GG/CG': (-0.7, -19.2), 
 318      'AT/TT': (-2.4, -6.5), 'TT/AT': (-3.2, -8.9), 'CT/GT': (-6.1, -16.9), 
 319      'GT/CT': (-7.4, -21.2), 
 320      'AA/TC': (-1.6, -4.0), 'AC/TA': (-1.8, -3.8), 'CA/GC': (-2.6, -5.9), 
 321      'CC/GA': (-2.7, -6.0), 'GA/CC': (-5.0, -13.8), 'GC/CA': (-3.2, -7.1), 
 322      'TA/AC': (-2.3, -5.9), 'TC/AA': (-2.7, -7.0), 
 323      'AC/TT': (-0.9, -1.7), 'AT/TC': (-2.3, -6.3), 'CC/GT': (-3.2, -8.0), 
 324      'CT/GC': (-3.9, -10.6), 'GC/CT': (-4.9, -13.5), 'GT/CC': (-3.0, -7.8), 
 325      'TC/AT': (-2.5, -6.3), 'TT/AC': (-0.7, -1.2), 
 326      'AA/TG': (-1.9, -4.4), 'AG/TA': (-2.5, -5.9), 'CA/GG': (-3.9, -9.6), 
 327      'CG/GA': (-6.0, -15.5), 'GA/CG': (-4.3, -11.1), ' GG/CA': (-4.6, -11.4), 
 328      'TA/AG': (-2.0, -4.7), 'TG/AA': (-2.4, -5.8), 
 329      'AG/TT': (-3.2, -8.7), 'AT/TG': (-3.5, -9.4), 'CG/GT': (-3.8, -9.0), 
 330      'CT/GG': (-6.6, -18.7), 'GG/CT': (-5.7, -15.9), 'GT/CG': (-5.9, -16.1), 
 331      'TG/AT': (-3.9, -10.5), 'TT/AG': (-3.6, -9.8)} 
 332   
 333  # Dangling ends table (DNA) 
 334  # Bommarito et al. (2000), Nucl Acids Res 28: 1929-1934 
 335  DNA_DE1 = { 
 336      'AA/.T': (0.2, 2.3), 'AC/.G': (-6.3, -17.1), 'AG/.C': (-3.7, -10.0), 
 337      'AT/.A': (-2.9, -7.6), 'CA/.T': (0.6, 3.3), 'CC/.G': (-4.4, -12.6), 
 338      'CG/.C': (-4.0, -11.9), 'CT/.A': (-4.1, -13.0), 'GA/.T': (-1.1, -1.6), 
 339      'GC/.G': (-5.1, -14.0), 'GG/.C': (-3.9, -10.9), 'GT/.A': (-4.2, -15.0), 
 340      'TA/.T': (-6.9, -20.0), 'TC/.G': (-4.0, -10.9), 'TG/.C': (-4.9, -13.8), 
 341      'TT/.A': (-0.2, -0.5), 
 342      '.A/AT': (-0.7, -0.8), '.C/AG': (-2.1, -3.9), '.G/AC': (-5.9, -16.5), 
 343      '.T/AA': (-0.5, -1.1), '.A/CT': (4.4, 14.9), '.C/CG': (-0.2, -0.1), 
 344      '.G/CC': (-2.6, -7.4), '.T/CA': (4.7, 14.2), '.A/GT': (-1.6, -3.6), 
 345      '.C/GG': (-3.9, -11.2), '.G/GC': (-3.2, -10.4), '.T/GA': (-4.1, -13.1), 
 346      '.A/TT': (2.9, 10.4), '.C/TG': (-4.4, -13.1), '.G/TC': (-5.2, -15.0), 
 347      '.T/TA': (-3.8, -12.6)} 
 348   
 349  # Dangling ends table (RNA) 
 350  # Turner & Mathews (2010), Nucl Acids Res 38: D280-D282 
 351  RNA_DE1 = { 
 352      '.T/AA': (-4.9, -13.2), '.T/CA': (-0.9, -1.3), '.T/GA': (-5.5, -15.1), 
 353      '.T/TA': (-2.3, -5.5), 
 354      '.G/AC': (-9.0, -23.5), '.G/CC': (-4.1, -10.6), '.G/GC': (-8.6, -22.2), 
 355      '.G/TC': (-7.5, -20.31), 
 356      '.C/AG': (-7.4, -20.3), '.C/CG': (-2.8, -7.7), '.C/GG': (-6.4, -16.4), 
 357      '.C/TG': (-3.6, -9.7), 
 358      '.T/AG': (-4.9, -13.2), '.T/CG': (-0.9, -1.3), '.T/GG': (-5.5, -15.1), 
 359      '.T/TG': (-2.3, -5.5), 
 360      '.A/AT': (-5.7, -16.1), '.A/CT': (-0.7, -1.9), '.A/GT': (-5.8, -16.4), 
 361      '.A/TT': (-2.2, -6.8), 
 362      '.G/AT': (-5.7, -16.1), '.G/CT': (-0.7, -1.9), '.G/GT': (-5.8, -16.4), 
 363      '.G/TT': (-2.2, -6.8), 
 364      'AT/.A': (-0.5, -0.6), 'CT/.A': (6.9, 22.6), 'GT/.A': (0.6, 2.6), 
 365      'TT/.A': (0.6, 2.6), 
 366      'AG/.C': (-1.6, -4.5), 'CG/.C': (0.7, 3.2), 'GG/.C': (-4.6, -14.8), 
 367      'TG/.C': (-0.4, -1.3), 
 368      'AC/.G': (-2.4, -6.1), 'CC/.G': (3.3, 11.6), 'GC/.G': (0.8, 3.2), 
 369      'TC/.G': (-1.4, -4.2), 
 370      'AT/.G': (-0.5, -0.6), 'CT/.G': (6.9, 22.6), 'GT/.G': (0.6, 2.6), 
 371      'TT/.G': (0.6, 2.6), 
 372      'AA/.T': (1.6, 6.1), 'CA/.T': (2.2, 8.1), 'GA/.T': (0.7, 3.5), 
 373      'TA/.T': (3.1, 10.6), 
 374      'AG/.T': (1.6, 6.1), 'CG/.T': (2.2, 8.1), 'GG/.T': (0.7, 3.5), 
 375      'TG/.T': (3.1, 10.6)} 
 376   
 377   
378 -def make_table(oldtable=None, values=None):
379 """Return a table with thermodynamic parameters (as dictionary). 380 381 Parameters: 382 oldtable: An existing dictionary with thermodynamic parameters. 383 values: A dictionary with new or updated values. 384 385 E.g., to replace the initiation parameters in the Sugimoto '96 dataset with 386 the initiation parameters from Allawi & SantaLucia '97: 387 388 >>> from Bio.SeqUtils.MeltingTemp import make_table, DNA_NN2 389 >>> table = DNA_NN2 # Sugimoto '96 390 >>> table['init_A/T'] 391 (0, 0) 392 >>> newtable = make_table(oldtable=DNA_NN2, values={'init': (0, 0), 393 ... 'init_A/T': (2.3, 4.1), 394 ... 'init_G/C': (0.1, -2.8)}) 395 >>> print("%0.1f, %0.1f" % newtable['init_A/T']) 396 2.3, 4.1 397 398 """ 399 if oldtable is None: 400 table = {'init': (0, 0), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 401 'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 402 'init_5T/A': (0, 0), 'sym': (0, 0), 'AA/TT': (0, 0), 403 'AT/TA': (0, 0), 'TA/AT': (0, 0), 'CA/GT': (0, 0), 404 'GT/CA': (0, 0), 'CT/GA': (0, 0), 'GA/CT': (0, 0), 405 'CG/GC': (0, 0), 'GC/CG': (0, 0), 'GG/CC': (0, 0)} 406 else: 407 table = oldtable.copy() 408 if values: 409 table.update(values) 410 return table
411 412
413 -def _check(seq, method):
414 """Return a sequence which fullfils the requirements of the given method. 415 416 All Tm methods in this package require the sequence in uppercase format. 417 Most methods make use of the length of the sequence (directly or 418 indirectly), which can only be expressed as len(seq) if the sequence does 419 not contain whitespaces and other non-base characters. RNA sequences are 420 backtranscribed to DNA. This method is PRIVATE. 421 422 Arguments: 423 seq: The sequence as given by the user (passed as string). 424 method: Tm_Wallace, Tm_GC or Tm_NN. 425 426 >>> from Bio.SeqUtils import MeltingTemp as mt 427 >>> mt._check('10 ACGTTGCAAG tccatggtac', 'Tm_NN') 428 'ACGTTGCAAGTCCATGGTAC' 429 430 """ 431 seq = ''.join(seq.split()).upper() 432 seq = str(Seq.Seq(seq).back_transcribe()) 433 if method == 'Tm_Wallace': 434 return seq 435 if method == 'Tm_GC': 436 baseset = ('A', 'B', 'C', 'D', 'G', 'H', 'I', 'K', 'M', 'N', 'R', 'S', 437 'T', 'V', 'W', 'X', 'Y') 438 if method == 'Tm_NN': 439 baseset = ('A', 'C', 'G', 'T', 'I') 440 seq = ''.join([base for base in seq if base in baseset]) 441 return seq
442 443
444 -def salt_correction(Na=0, K=0, Tris=0, Mg=0, dNTPs=0, method=1, seq=None):
445 """Calculate a term to correct Tm for salt ions. 446 447 Depending on the Tm calculation, the term will correct Tm or entropy. To 448 calculate corrected Tm values, different operations need to be applied: 449 450 - methods 1-4: Tm(new) = Tm(old) + corr 451 - method 5: deltaH(new) = deltaH(old) + corr 452 - methods 6+7: Tm(new) = 1/(1/Tm(old) + corr) 453 454 Parameters: 455 - Na, K, Tris, Mg, dNTPS: Millimolar concentration of respective ion. To 456 have a simple 'salt correction', just pass Na. If any of K, Tris, Mg and 457 dNTPS is non-zero, a 'sodium-equivalent' concentration is calculated 458 according to von Ahsen et al. (2001, Clin Chem 47: 1956-1961): 459 [Na_eq] = [Na+] + [K+] + [Tris]/2 + 120*([Mg2+] - [dNTPs])^0.5 460 If [dNTPs] >= [Mg2+]: [Na_eq] = [Na+] + [K+] + [Tris]/2 461 - method: Which method to be applied. Methods 1-4 correct Tm, method 5 462 corrects deltaS, methods 6 and 7 correct 1/Tm. The methods are: 463 464 1. 16.6 x log[Na+] 465 (Schildkraut & Lifson (1965), Biopolymers 3: 195-208) 466 2. 16.6 x log([Na+]/(1.0 + 0.7*[Na+])) 467 (Wetmur (1991), Crit Rev Biochem Mol Biol 126: 227-259) 468 3. 12.5 x log(Na+] 469 (SantaLucia et al. (1996), Biochemistry 35: 3555-3562 470 4. 11.7 x log[Na+] 471 (SantaLucia (1998), Proc Natl Acad Sci USA 95: 1460-1465 472 5. Correction for deltaS: 0.368 x (N-1) x ln[Na+] 473 (SantaLucia (1998), Proc Natl Acad Sci USA 95: 1460-1465) 474 6. (4.29(%GC)-3.95)x1e-5 x ln[Na+] + 9.40e-6 x ln[Na+]^2 475 (Owczarzy et al. (2004), Biochemistry 43: 3537-3554) 476 7. Complex formula with decision tree and 7 empirical constants. 477 Mg2+ is corrected for dNTPs binding (if present) 478 (Owczarzy et al. (2008), Biochemistry 47: 5336-5353) 479 480 Examples: 481 482 >>> from Bio.SeqUtils import MeltingTemp as mt 483 >>> print('%0.2f' % mt.salt_correction(Na=50, method=1)) 484 -21.60 485 >>> print('%0.2f' % mt.salt_correction(Na=50, method=2)) 486 -21.85 487 >>> print('%0.2f' % mt.salt_correction(Na=100, Tris=20, method=2)) 488 -16.45 489 >>> print('%0.2f' % mt.salt_correction(Na=100, Tris=20, Mg=1.5, 490 ... method=2)) 491 -10.99 492 493 """ 494 if method in (5, 6, 7) and not seq: 495 raise ValueError('sequence is missing (is needed to calculate ' + 496 'GC content or sequence length).') 497 if seq: 498 seq = str(seq) 499 corr = 0 500 if not method: 501 return corr 502 Mon = Na + K + Tris / 2.0 # Note: all these values are millimolar 503 mg = Mg * 1e-3 # Lowercase ions (mg, mon, dntps) are molar 504 # Na equivalent according to von Ahsen et al. (2001): 505 if sum((K, Mg, Tris, dNTPs)) > 0 and not method == 7 and dNTPs < Mg: 506 # dNTPs bind Mg2+ strongly. If [dNTPs] is larger or equal than 507 # [Mg2+], free Mg2+ is considered not to be relevant. 508 Mon += 120 * math.sqrt(Mg - dNTPs) 509 mon = Mon * 1e-3 510 # Note: math.log = ln(), math.log10 = log() 511 if method in range(1, 7) and not mon: 512 raise ValueError('Total ion concentration of zero is not allowed in ' + 513 'this method.') 514 if method == 1: 515 corr = 16.6 * math.log10(mon) 516 if method == 2: 517 corr = 16.6 * math.log10((mon) / (1.0 + 0.7 * (mon))) 518 if method == 3: 519 corr = 12.5 * math.log10(mon) 520 if method == 4: 521 corr = 11.7 * math.log10(mon) 522 if method == 5: 523 corr = 0.368 * (len(seq) - 1) * math.log(mon) 524 if method == 6: 525 corr = (4.29 * SeqUtils.GC(seq) / 100 - 3.95) * 1e-5 * math.log(mon) +\ 526 9.40e-6 * math.log(mon) ** 2 527 if method == 7: 528 a, b, c, d = 3.92, -0.911, 6.26, 1.42 529 e, f, g = -48.2, 52.5, 8.31 530 if dNTPs > 0: 531 dntps = dNTPs * 1e-3 532 ka = 3e4 # Dissociation constant for Mg:dNTP 533 # Free Mg2+ calculation: 534 mg = (-(ka * dntps - ka * mg + 1.0) + 535 math.sqrt((ka * dntps - ka * mg + 1.0) ** 2 + 536 4.0 * ka * mg)) / (2.0 * ka) 537 if Mon > 0: 538 R = math.sqrt(mg) / mon 539 if R < 0.22: 540 corr = (4.29 * SeqUtils.GC(seq) / 100 - 3.95) * \ 541 1e-5 * math.log(mon) + 9.40e-6 * math.log(mon) ** 2 542 return corr 543 elif R < 6.0: 544 a = 3.92 * (0.843 - 0.352 * math.sqrt(mon) * math.log(mon)) 545 d = 1.42 * (1.279 - 4.03e-3 * math.log(mon) - 546 8.03e-3 * math.log(mon) ** 2) 547 g = 8.31 * (0.486 - 0.258 * math.log(mon) + 548 5.25e-3 * math.log(mon) ** 3) 549 corr = (a + b * math.log(mg) + (SeqUtils.GC(seq) / 100) * 550 (c + d * math.log(mg)) + (1 / (2.0 * (len(seq) - 1))) * 551 (e + f * math.log(mg) + g * math.log(mg) ** 2)) * 1e-5 552 if method > 7: 553 raise ValueError('Allowed values for parameter \'method\' are 1-7.') 554 return corr
555 556
557 -def chem_correction(melting_temp, DMSO=0, fmd=0, DMSOfactor=0.75, 558 fmdfactor=0.65, fmdmethod=1, GC=None):
559 """Correct a given Tm for DMSO and formamide. 560 561 Please note that these corrections are +/- rough approximations. 562 563 Arguments: 564 - melting_temp: Melting temperature. 565 - DMSO: Percent DMSO. 566 - fmd: Formamide concentration in %(fmdmethod=1) or molar (fmdmethod=2). 567 - DMSOfactor: How much should Tm decreases per percent DMSO. Default=0.65 568 (von Ahsen et al. 2001). Other published values are 0.5, 0.6 and 0.675. 569 - fmdfactor: How much should Tm decrease per percent formamide. 570 Default=0.65. Several papers report factors between 0.6 and 0.72. 571 - fmdmethod: 572 573 1. Tm = Tm - factor(%formamide) (Default) 574 2. Tm = Tm + (0.453(f(GC)) - 2.88) x [formamide] 575 576 Here f(GC) is fraction of GC. 577 Note (again) that in fmdmethod=1 formamide concentration is given in %, 578 while in fmdmethod=2 it is given in molar. 579 - GC: GC content in percent. 580 581 Examples: 582 583 >>> from Bio.SeqUtils import MeltingTemp as mt 584 >>> mt.chem_correction(70) 585 70 586 >>> print('%0.2f' % mt.chem_correction(70, DMSO=3)) 587 67.75 588 >>> print('%0.2f' % mt.chem_correction(70, fmd=5)) 589 66.75 590 >>> print('%0.2f' % mt.chem_correction(70, fmdmethod=2, fmd=1.25, 591 ... GC=50)) 592 66.68 593 594 """ 595 if DMSO: 596 melting_temp -= DMSOfactor * DMSO 597 if fmd: 598 # McConaughy et al. (1969), Biochemistry 8: 3289-3295 599 if fmdmethod == 1: 600 # Note: Here fmd is given in percent 601 melting_temp -= fmdfactor * fmd 602 # Blake & Delcourt (1996), Nucl Acids Res 11: 2095-2103 603 if fmdmethod == 2: 604 if GC is None or GC < 0: 605 raise ValueError('\'GC\' is missing or negative') 606 # Note: Here fmd is given in molar 607 melting_temp += (0.453 * (GC / 100.0) - 2.88) * fmd 608 if fmdmethod not in (1, 2): 609 raise ValueError('\'fmdmethod\' must be 1 or 2') 610 return melting_temp
611 612
613 -def Tm_Wallace(seq, check=True, strict=True):
614 """Calculate and return the Tm using the 'Wallace rule'. 615 616 Tm = 4 degC * (G + C) + 2 degC * (A+T) 617 618 The Wallace rule (Thein & Wallace 1986, in Human genetic diseases: a 619 practical approach, 33-50) is often used as rule of thumb for approximate 620 Tm calculations for primers of 14 to 20 nt length. 621 622 Non-DNA characters (e.g., E, F, J, !, 1, etc) are ignored by this method. 623 624 Examples: 625 626 >>> from Bio.SeqUtils import MeltingTemp as mt 627 >>> mt.Tm_Wallace('ACGTTGCAATGCCGTA') 628 48.0 629 >>> mt.Tm_Wallace('ACGT TGCA ATGC CGTA') 630 48.0 631 >>> mt.Tm_Wallace('1ACGT2TGCA3ATGC4CGTA') 632 48.0 633 634 """ 635 seq = str(seq) 636 if check: 637 seq = _check(seq, 'Tm_Wallace') 638 639 melting_temp = 2 * (sum(map(seq.count, ('A', 'T', 'W')))) + \ 640 4 * (sum(map(seq.count, ('C', 'G', 'S')))) 641 642 # Intermediate values for ambiguous positions: 643 tmp = 3 * (sum(map(seq.count, ('K', 'M', 'N', 'R', 'Y')))) + \ 644 10 / 3.0 * (sum(map(seq.count, ('B', 'V')))) + \ 645 8 / 3.0 * (sum(map(seq.count, ('D', 'H')))) 646 if strict and tmp: 647 raise ValueError('ambiguous bases B, D, H, K, M, N, R, V, Y not ' + 648 'allowed when strict=True') 649 else: 650 melting_temp += tmp 651 return melting_temp
652 653
654 -def Tm_GC(seq, check=True, strict=True, valueset=7, userset=None, Na=50, K=0, 655 Tris=0, Mg=0, dNTPs=0, saltcorr=0, mismatch=True):
656 """Return the Tm using empirical formulas based on GC content. 657 658 General format: Tm = A + B(%GC) - C/N + salt correction - D(%mismatch) 659 660 A, B, C, D: empirical constants, N: primer length 661 D (amount of decrease in Tm per % mismatch) is often 1, but sometimes other 662 values have been used (0.6-1.5). Use 'X' to indicate the mismatch position 663 in the sequence. Note that this mismatch correction is a rough estimate. 664 665 >>> from Bio.SeqUtils import MeltingTemp as mt 666 >>> print("%0.2f" % mt.Tm_GC('CTGCTGATXGCACGAGGTTATGG', valueset=2)) 667 69.20 668 669 Arguments: 670 - valueset: A few often cited variants are included: 671 672 1. Tm = 69.3 + 0.41(%GC) - 650/N 673 (Marmur & Doty 1962, J Mol Biol 5: 109-118; Chester & Marshak 1993), 674 Anal Biochem 209: 284-290) 675 2. Tm = 81.5 + 0.41(%GC) - 675/N - %mismatch 676 'QuikChange' formula. Recommended (by the manufacturer) for the 677 design of primers for QuikChange mutagenesis. 678 3. Tm = 81.5 + 0.41(%GC) - 675/N + 16.6 x log[Na+] 679 (Marmur & Doty 1962, J Mol Biol 5: 109-118; Schildkraut & Lifson 680 1965, Biopolymers 3: 195-208) 681 4. Tm = 81.5 + 0.41(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x 682 [Na+])) - %mismatch 683 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). This is the 684 standard formula in approximative mode of MELTING 4.3. 685 5. Tm = 78 + 0.7(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x [Na+])) 686 - %mismatch 687 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). For RNA. 688 6. Tm = 67 + 0.8(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x [Na+])) 689 - %mismatch 690 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). For RNA/DNA 691 hybrids. 692 7. Tm = 81.5 + 0.41(%GC) - 600/N + 16.6 x log[Na+] 693 Used by Primer3Plus to calculate the product Tm. Default set. 694 8. Tm = 77.1 + 0.41(%GC) - 528/N + 11.7 x log[Na+] 695 (von Ahsen et al. 2001, Clin Chem 47: 1956-1961). Recommended 'as a 696 tradeoff between accuracy and ease of use'. 697 698 - userset: Tuple of four values for A, B, C, and D. Usersets override 699 valuesets. 700 - Na, K, Tris, Mg, dNTPs: Concentration of the respective ions [mM]. If 701 any of K, Tris, Mg and dNTPS is non-zero, a 'sodium-equivalent' 702 concentration is calculated and used for salt correction (von Ahsen et 703 al., 2001). 704 - saltcorr: Type of salt correction (see method salt_correction). 705 Default=5. 0 or None means no salt correction. 706 - mismatch: If 'True' (default) every 'X' in the sequence is counted as 707 mismatch. 708 709 """ 710 if saltcorr == 5: 711 raise ValueError('salt-correction method 5 not applicable' + 712 'to Tm_GC') 713 seq = str(seq) 714 if check: 715 seq = _check(seq, 'Tm_GC') 716 percent_gc = SeqUtils.GC(seq) 717 # Ambiguous bases: add 0.5, 0.67 or 0.33% depending on G+C probability: 718 tmp = sum(map(seq.count, ('K', 'M', 'N', 'R', 'Y'))) * 50.0 / len(seq) + \ 719 sum(map(seq.count, ('B', 'V'))) * 66.67 / len(seq) + \ 720 sum(map(seq.count, ('D', 'H'))) * 33.33 / len(seq) 721 if strict and tmp: 722 raise ValueError('ambiguous bases B, D, H, K, M, N, R, V, Y not ' + 723 'allowed when strict=True') 724 else: 725 percent_gc += tmp 726 if userset: 727 A, B, C, D = userset 728 else: 729 if valueset == 1: 730 A, B, C, D = (69.3, 0.41, 650, 1) 731 saltcorr = 0 732 if valueset == 2: 733 A, B, C, D = (81.5, 0.41, 675, 1) 734 saltcorr = 0 735 if valueset == 3: 736 A, B, C, D = (81.5, 0.41, 675, 1) 737 saltcorr = 2 738 if valueset == 4: 739 A, B, C, D = (81.5, 0.41, 500, 1) 740 saltcorr = 3 741 if valueset == 5: 742 A, B, C, D = (78.0, 0.7, 500, 1) 743 saltcorr = 3 744 if valueset == 6: 745 A, B, C, D = (67.0, 0.8, 500, 1) 746 saltcorr = 3 747 if valueset == 7: 748 A, B, C, D = (81.5, 0.41, 600, 1) 749 saltcorr = 2 750 if valueset == 8: 751 A, B, C, D = (77.1, 0.41, 528, 1) 752 saltcorr = 4 753 if valueset > 8: 754 raise ValueError('allowed values for parameter \'valueset\' are 0-8.') 755 756 melting_temp = A + B * percent_gc - C / (len(seq) * 1.0) 757 if saltcorr: 758 melting_temp += salt_correction(Na=Na, K=K, Tris=Tris, Mg=Mg, 759 dNTPs=dNTPs, seq=seq, method=saltcorr) 760 if mismatch: 761 melting_temp -= D * (seq.count('X') * 100.0 / len(seq)) 762 return melting_temp
763 764
765 -def _key_error(neighbors, strict):
766 """Throw an error or a warning if there is no data for the neighbors.""" 767 # We haven't found the key in the tables 768 if strict: 769 raise ValueError('no thermodynamic data for neighbors \'' + neighbors + 770 '\' available') 771 else: 772 warnings.warn('no themodynamic data for neighbors \'' + neighbors + 773 '\' available. Calculation will be wrong', 774 BiopythonWarning) 775 return
776 777
778 -def Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=DNA_NN3, 779 tmm_table=DNA_TMM1, imm_table=DNA_IMM1, de_table=DNA_DE1, 780 dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, 781 dNTPs=0, saltcorr=5):
782 """Return the Tm using nearest neighbor thermodynamics. 783 784 Arguments: 785 - seq: The primer/probe sequence as string or Biopython sequence object. 786 For RNA/DNA hybridizations seq must be the RNA sequence. 787 - c_seq: Complementary sequence. The sequence of the template/target in 788 3'->5' direction. c_seq is necessary for mismatch correction and 789 dangling-ends correction. Both corrections will automatically be 790 applied if mismatches or dangling ends are present. Default=None. 791 - shift: Shift of the primer/probe sequence on the template/target 792 sequence, e.g.:: 793 794 shift=0 shift=1 shift= -1 795 Primer (seq): 5' ATGC... 5' ATGC... 5' ATGC... 796 Template (c_seq): 3' TACG... 3' CTACG... 3' ACG... 797 798 The shift parameter is necessary to align seq and c_seq if they have 799 different lengths or if they should have dangling ends. Default=0 800 - table: Thermodynamic NN values, eight tables are implemented: 801 For DNA/DNA hybridizations: 802 803 - DNA_NN1: values from Breslauer et al. (1986) 804 - DNA_NN2: values from Sugimoto et al. (1996) 805 - DNA_NN3: values from Allawi & SantaLucia (1997) (default) 806 - DNA_NN4: values from SantaLucia & Hicks (2004) 807 808 For RNA/RNA hybridizations: 809 810 - RNA_NN1: values from Freier et al. (1986) 811 - RNA_NN2: values from Xia et al. (1998) 812 - RNA_NN3: valuse from Chen et al. (2012) 813 814 For RNA/DNA hybridizations: 815 816 - R_DNA_NN1: values from Sugimoto et al. (1995) 817 818 Use the module's maketable method to make a new table or to update one 819 one of the implemented tables. 820 - tmm_table: Thermodynamic values for terminal mismatches. 821 Default: DNA_TMM1 (SantaLucia & Peyret, 2001) 822 - imm_table: Thermodynamic values for internal mismatches, may include 823 insosine mismatches. Default: DNA_IMM1 (Allawi & SantaLucia, 1997-1998; 824 Peyret et al., 1999; Watkins & SantaLucia, 2005) 825 - de_table: Thermodynamic values for dangling ends: 826 827 - DNA_DE1: for DNA. Values from Bommarito et al. (2000). Default 828 - RNA_DE1: for RNA. Values from Turner & Mathews (2010) 829 830 - dnac1: Concentration of the higher concentrated strand [nM]. Typically 831 this will be the primer (for PCR) or the probe. Default=25. 832 - dnac2: Concentration of the lower concentrated strand [nM]. In PCR this 833 is the template strand which concentration is typically very low and may 834 be ignored (dnac2=0). In oligo/oligo hybridization experiments, dnac1 835 equals dnac1. Default=25. 836 MELTING and Primer3Plus use k = [Oligo(Total)]/4 by default. To mimic 837 this behaviour, you have to divide [Oligo(Total)] by 2 and assign this 838 concentration to dnac1 and dnac2. E.g., Total oligo concentration of 839 50 nM in Primer3Plus means dnac1=25, dnac2=25. 840 - selfcomp: Is the sequence self-complementary? Default=False. If 'True' 841 the primer is thought binding to itself, thus dnac2 is not considered. 842 - Na, K, Tris, Mg, dNTPs: See method 'Tm_GC' for details. Defaults: Na=50, 843 K=0, Tris=0, Mg=0, dNTPs=0. 844 - saltcorr: See method 'Tm_GC'. Default=5. 0 means no salt correction. 845 846 """ 847 seq = str(seq) 848 if not c_seq: 849 # c_seq must be provided by user if dangling ends or mismatches should 850 # be taken into account. Otherwise take perfect complement. 851 c_seq = Seq.Seq(seq).complement() 852 c_seq = str(c_seq) 853 if check: 854 seq = _check(seq, 'Tm_NN') 855 c_seq = _check(c_seq, 'Tm_NN') 856 tmp_seq = seq 857 tmp_cseq = c_seq 858 delta_h = 0 859 delta_s = 0 860 d_h = 0 # Names for indexes 861 d_s = 1 # 0 and 1 862 863 # Dangling ends? 864 if shift or len(seq) != len(c_seq): 865 # Align both sequences using the shift parameter 866 if shift > 0: 867 tmp_seq = '.' * shift + seq 868 if shift < 0: 869 tmp_cseq = '.' * abs(shift) + c_seq 870 if len(tmp_cseq) > len(tmp_seq): 871 tmp_seq += (len(tmp_cseq) - len(tmp_seq)) * '.' 872 if len(tmp_cseq) < len(tmp_seq): 873 tmp_cseq += (len(tmp_seq) - len(tmp_cseq)) * '.' 874 # Remove 'over-dangling' ends 875 while tmp_seq.startswith('..') or tmp_cseq.startswith('..'): 876 tmp_seq = tmp_seq[1:] 877 tmp_cseq = tmp_cseq[1:] 878 while tmp_seq.endswith('..') or tmp_cseq.endswith('..'): 879 tmp_seq = tmp_seq[:-1] 880 tmp_cseq = tmp_cseq[:-1] 881 # Now for the dangling ends 882 if tmp_seq.startswith('.') or tmp_cseq.startswith('.'): 883 left_de = tmp_seq[:2] + '/' + tmp_cseq[:2] 884 try: 885 delta_h += de_table[left_de][d_h] 886 delta_s += de_table[left_de][d_s] 887 except KeyError: 888 _key_error(left_de, strict) 889 tmp_seq = tmp_seq[1:] 890 tmp_cseq = tmp_cseq[1:] 891 if tmp_seq.endswith('.') or tmp_cseq.endswith('.'): 892 right_de = tmp_cseq[-2:][::-1] + '/' + tmp_seq[-2:][::-1] 893 try: 894 delta_h += de_table[right_de][d_h] 895 delta_s += de_table[right_de][d_s] 896 except KeyError: 897 _key_error(right_de, strict) 898 tmp_seq = tmp_seq[:-1] 899 tmp_cseq = tmp_cseq[:-1] 900 901 # Now for terminal mismatches 902 left_tmm = tmp_cseq[:2][::-1] + '/' + tmp_seq[:2][::-1] 903 if left_tmm in tmm_table: 904 delta_h += tmm_table[left_tmm][d_h] 905 delta_s += tmm_table[left_tmm][d_s] 906 tmp_seq = tmp_seq[1:] 907 tmp_cseq = tmp_cseq[1:] 908 right_tmm = tmp_seq[-2:] + '/' + tmp_cseq[-2:] 909 if right_tmm in tmm_table: 910 delta_h += tmm_table[right_tmm][d_h] 911 delta_s += tmm_table[right_tmm][d_s] 912 tmp_seq = tmp_seq[:-1] 913 tmp_cseq = tmp_cseq[:-1] 914 915 # Now everything 'unusual' at the ends is handled and removed and we can 916 # look at the initiation. 917 # One or several of the following initiation types may apply: 918 919 # Type: General initiation value 920 delta_h += nn_table['init'][d_h] 921 delta_s += nn_table['init'][d_s] 922 923 # Type: Duplex with no (allA/T) or at least one (oneG/C) GC pair 924 if SeqUtils.GC(seq) == 0: 925 delta_h += nn_table['init_allA/T'][d_h] 926 delta_s += nn_table['init_allA/T'][d_s] 927 else: 928 delta_h += nn_table['init_oneG/C'][d_h] 929 delta_s += nn_table['init_oneG/C'][d_s] 930 931 # Type: Penalty if 5' end is T 932 if seq.startswith('T'): 933 delta_h += nn_table['init_5T/A'][d_h] 934 delta_s += nn_table['init_5T/A'][d_s] 935 if seq.endswith('A'): 936 delta_h += nn_table['init_5T/A'][d_h] 937 delta_s += nn_table['init_5T/A'][d_s] 938 939 # Type: Different values for G/C or A/T terminal basepairs 940 ends = seq[0] + seq[-1] 941 AT = ends.count('A') + ends.count('T') 942 GC = ends.count('G') + ends.count('C') 943 delta_h += nn_table['init_A/T'][d_h] * AT 944 delta_s += nn_table['init_A/T'][d_s] * AT 945 delta_h += nn_table['init_G/C'][d_h] * GC 946 delta_s += nn_table['init_G/C'][d_s] * GC 947 948 # Finally, the 'zipping' 949 for basenumber in range(len(tmp_seq) - 1): 950 neighbors = tmp_seq[basenumber:basenumber + 2] + '/' + \ 951 tmp_cseq[basenumber:basenumber + 2] 952 if neighbors in imm_table: 953 delta_h += imm_table[neighbors][d_h] 954 delta_s += imm_table[neighbors][d_s] 955 elif neighbors[::-1] in imm_table: 956 delta_h += imm_table[neighbors[::-1]][d_h] 957 delta_s += imm_table[neighbors[::-1]][d_s] 958 elif neighbors in nn_table: 959 delta_h += nn_table[neighbors][d_h] 960 delta_s += nn_table[neighbors][d_s] 961 elif neighbors[::-1] in nn_table: 962 delta_h += nn_table[neighbors[::-1]][d_h] 963 delta_s += nn_table[neighbors[::-1]][d_s] 964 else: 965 # We haven't found the key... 966 _key_error(neighbors, strict) 967 968 k = (dnac1 - (dnac2 / 2.0)) * 1e-9 969 if selfcomp: 970 k = dnac1 * 1e-9 971 delta_h += nn_table['sym'][d_h] 972 delta_s += nn_table['sym'][d_s] 973 R = 1.987 # universal gas constant in Cal/degrees C*Mol 974 if saltcorr: 975 corr = salt_correction(Na=Na, K=K, Tris=Tris, Mg=Mg, dNTPs=dNTPs, 976 method=saltcorr, seq=seq) 977 if saltcorr == 5: 978 delta_s += corr 979 melting_temp = (1000 * delta_h) / (delta_s + (R * (math.log(k)))) - 273.15 980 if saltcorr in (1, 2, 3, 4): 981 melting_temp += corr 982 if saltcorr in (6, 7): 983 # Tm = 1/(1/Tm + corr) 984 melting_temp = (1 / (1 / (melting_temp + 273.15) + corr) - 273.15) 985 986 return melting_temp
987 988
989 -def Tm_staluc(s, dnac=50, saltc=50, rna=0):
990 """Returns DNA/DNA Tm using nearest neighbor thermodynamics (OBSOLETE). 991 992 This method may be depreceated in the future. Use Tm_NN instead. Tm_NN 993 with default values gives the same result as Tm_staluc. 994 995 s is the sequence as string or Seq object 996 dnac is DNA concentration [nM] 997 saltc is salt concentration [mM]. 998 rna=0 is for DNA/DNA (default), use 1 for RNA/RNA hybridisation. 999 1000 For DNA/DNA, see Allawi & SantaLucia (1997), Biochemistry 36: 10581-10594 1001 For RNA/RNA, see Xia et al (1998), Biochemistry 37: 14719-14735 1002 1003 Example: 1004 1005 >>> print("%0.2f" % Tm_staluc('CAGTCAGTACGTACGTGTACTGCCGTA')) 1006 59.87 1007 >>> print("%0.2f" % Tm_staluc('CAGTCAGTACGTACGTGTACTGCCGTA', rna=True)) 1008 77.90 1009 1010 You can also use a Seq object instead of a string, 1011 1012 >>> from Bio.Seq import Seq 1013 >>> from Bio.Alphabet import generic_nucleotide 1014 >>> s = Seq('CAGTCAGTACGTACGTGTACTGCCGTA', generic_nucleotide) 1015 >>> print("%0.2f" % Tm_staluc(s)) 1016 59.87 1017 >>> print("%0.2f" % Tm_staluc(s, rna=True)) 1018 77.90 1019 1020 """ 1021 # Original method was by Sebastian Bassi <sbassi@genesdigitales.com>. It is 1022 # now superseded by Tm_NN. 1023 1024 warnings.warn('Tm_staluc may be depreciated in the future. Use Tm_NN ' + 1025 'instead.', PendingDeprecationWarning) 1026 if not rna: 1027 return Tm_NN(s, dnac1=dnac / 2.0, dnac2=dnac / 2.0, Na=saltc) 1028 elif rna == 1: 1029 return Tm_NN(s, dnac1=dnac / 2.0, dnac2=dnac / 2.0, Na=saltc, 1030 nn_table=RNA_NN2) 1031 else: 1032 raise ValueError("rna={0} not supported".format(rna))
1033 1034
1035 -def _test():
1036 """Run the module's doctests (PRIVATE).""" 1037 import doctest 1038 print("Running doctests...") 1039 doctest.testmod() 1040 print("Done")
1041 1042 if __name__ == "__main__": 1043 _test() 1044