Package Bio :: Module Seq
[hide private]
[frames] | no frames]

Module Seq

source code

Provides objects to represent biological sequences with alphabets.

See also http://biopython.org/wiki/Seq and the chapter in our tutorial:

Classes [hide private]
  Seq
A read-only sequence object (essentially a string with an alphabet).
  UnknownSeq
A read-only sequence object of known length but unknown contents.
  MutableSeq
An editable sequence object (with an alphabet).
Functions [hide private]
 
_maketrans(complement_mapping)
Makes a python string translation table (PRIVATE).
source code
 
transcribe(dna)
Transcribes a DNA sequence into RNA.
source code
 
back_transcribe(rna)
Back-transcribes an RNA sequence into DNA.
source code
 
_translate_str(sequence, table, stop_symbol='*', to_stop=False, cds=False, pos_stop='X')
Helper function to translate a nucleotide string (PRIVATE).
source code
 
translate(sequence, table='Standard', stop_symbol='*', to_stop=False, cds=False)
Translate a nucleotide sequence into amino acids.
source code
 
reverse_complement(sequence)
Returns the reverse complement sequence of a nucleotide string.
source code
 
_test()
Run the Bio.Seq module's doctests (PRIVATE).
source code
Variables [hide private]
  _dna_complement_table = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\...
  _rna_complement_table = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\...
  __package__ = 'Bio'
Function Details [hide private]

_maketrans(complement_mapping)

source code 

Makes a python string translation table (PRIVATE).

Arguments:

  • complement_mapping - a dictionary such as ambiguous_dna_complement and ambiguous_rna_complement from Data.IUPACData.

Returns a translation table (a string of length 256) for use with the python string's translate method to use in a (reverse) complement.

Compatible with lower case and upper case sequences.

For internal use only.

transcribe(dna)

source code 

Transcribes a DNA sequence into RNA.

If given a string, returns a new string object.

Given a Seq or MutableSeq, returns a new Seq object with an RNA alphabet.

Trying to transcribe a protein or RNA sequence raises an exception.

e.g.

>>> transcribe("ACTGN")
'ACUGN'

back_transcribe(rna)

source code 

Back-transcribes an RNA sequence into DNA.

If given a string, returns a new string object.

Given a Seq or MutableSeq, returns a new Seq object with an RNA alphabet.

Trying to transcribe a protein or DNA sequence raises an exception.

e.g.

>>> back_transcribe("ACUGN")
'ACTGN'

_translate_str(sequence, table, stop_symbol='*', to_stop=False, cds=False, pos_stop='X')

source code 

Helper function to translate a nucleotide string (PRIVATE).

Arguments:

  • sequence - a string
  • table - a CodonTable object (NOT a table name or id number)
  • stop_symbol - a single character string, what to use for terminators.
  • to_stop - boolean, should translation terminate at the first in frame stop codon? If there is no in-frame stop codon then translation continues to the end.
  • pos_stop - a single character string for a possible stop codon (e.g. TAN or NNN)
  • cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised.

Returns a string.

e.g.

>>> from Bio.Data import CodonTable
>>> table = CodonTable.ambiguous_dna_by_id[1]
>>> _translate_str("AAA", table)
'K'
>>> _translate_str("TAR", table)
'*'
>>> _translate_str("TAN", table)
'X'
>>> _translate_str("TAN", table, pos_stop="@")
'@'
>>> _translate_str("TA?", table)
Traceback (most recent call last):
   ...
TranslationError: Codon 'TA?' is invalid

In a change to older verions of Biopython, partial codons are now always regarded as an error (previously only checked if cds=True) and will trigger a warning (likely to become an exception in a future release).

If cds=True, the start and stop codons are checked, and the start codon will be translated at methionine. The sequence must be an while number of codons.

>>> _translate_str("ATGCCCTAG", table, cds=True)
'MP'
>>> _translate_str("AAACCCTAG", table, cds=True)
Traceback (most recent call last):
   ...
TranslationError: First codon 'AAA' is not a start codon
>>> _translate_str("ATGCCCTAGCCCTAG", table, cds=True)
Traceback (most recent call last):
   ...
TranslationError: Extra in frame stop codon found.

translate(sequence, table='Standard', stop_symbol='*', to_stop=False, cds=False)

source code 

Translate a nucleotide sequence into amino acids.

If given a string, returns a new string object. Given a Seq or MutableSeq, returns a Seq object with a protein alphabet.

Arguments:

  • table - Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). Defaults to the "Standard" table.
  • stop_symbol - Single character string, what to use for any terminators, defaults to the asterisk, "*".
  • to_stop - Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence).
  • cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised.

A simple string example using the default (standard) genetic code:

>>> coding_dna = "GTGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG"
>>> translate(coding_dna)
'VAIVMGR*KGAR*'
>>> translate(coding_dna, stop_symbol="@")
'VAIVMGR@KGAR@'
>>> translate(coding_dna, to_stop=True)
'VAIVMGR'

Now using NCBI table 2, where TGA is not a stop codon:

>>> translate(coding_dna, table=2)
'VAIVMGRWKGAR*'
>>> translate(coding_dna, table=2, to_stop=True)
'VAIVMGRWKGAR'

In fact this example uses an alternative start codon valid under NCBI table 2, GTG, which means this example is a complete valid CDS which when translated should really start with methionine (not valine):

>>> translate(coding_dna, table=2, cds=True)
'MAIVMGRWKGAR'

Note that if the sequence has no in-frame stop codon, then the to_stop argument has no effect:

>>> coding_dna2 = "GTGGCCATTGTAATGGGCCGC"
>>> translate(coding_dna2)
'VAIVMGR'
>>> translate(coding_dna2, to_stop=True)
'VAIVMGR'

NOTE - Ambiguous codons like "TAN" or "NNN" could be an amino acid or a stop codon. These are translated as "X". Any invalid codon (e.g. "TA?" or "T-A") will throw a TranslationError.

NOTE - Does NOT support gapped sequences.

It will however translate either DNA or RNA.

reverse_complement(sequence)

source code 

Returns the reverse complement sequence of a nucleotide string.

If given a string, returns a new string object. Given a Seq or a MutableSeq, returns a new Seq object with the same alphabet.

Supports unambiguous and ambiguous nucleotide sequences.

e.g.

>>> reverse_complement("ACTG-NH")
'DN-CAGT'

Variables Details [hide private]

_dna_complement_table

Value:
'''\x00\x01\x02\x03\x04\x05\x06\x07\x08\t
\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\
\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@TVGHEFCDIJMLKNOPQYSAUBWX\
RZ[\\]^_`tvghefcdijmlknopqysaubwxrz{|}~\x7f\x80\x81\x82\x83\x84\x85\x8\
6\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\\
x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa\
9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\\
xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xc\
...

_rna_complement_table

Value:
'''\x00\x01\x02\x03\x04\x05\x06\x07\x08\t
\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\
\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@UVGHEFCDIJMLKNOPQYSTABWX\
RZ[\\]^_`uvghefcdijmlknopqystabwxrz{|}~\x7f\x80\x81\x82\x83\x84\x85\x8\
6\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\\
x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa\
9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\\
xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xc\
...