Package Bio :: Package CodonAlign :: Module CodonSeq' :: Class CodonSeq
[hide private]
[frames] | no frames]

Class CodonSeq

source code

object --+    
         |    
   Seq.Seq --+
             |
            CodonSeq

CodonSeq is designed to be within the SeqRecords of a CodonAlignment class.

CodonSeq is useful as it allows the user to specify reading frame when translate CodonSeq

CodonSeq also accepts codon style slice by calling get_codon() method.

Important: Ungapped CodonSeq can be any length if you specify the rf_table. Gapped CodonSeq should be a multiple of three.

>>> codonseq = CodonSeq("AAATTTGGGCCAAATTT", rf_table=(0,3,6,8,11,14))
>>> print(codonseq.translate())
KFGAKF

test get_full_rf_table method

>>> p = CodonSeq('AAATTTCCCGG-TGGGTTTAA', rf_table=(0, 3, 6, 9, 11, 14, 17))
>>> full_rf_table = p.get_full_rf_table()
>>> print(full_rf_table)
[0, 3, 6, 9, 12, 15, 18]
>>> print(p.translate(rf_table=full_rf_table, ungap_seq=False))
KFPPWV*
>>> p = CodonSeq('AAATTTCCCGGGAA-TTTTAA', rf_table=(0, 3, 6, 9, 14, 17))
>>> print(p.get_full_rf_table())
[0, 3, 6, 9, 12.0, 15, 18]
>>> p = CodonSeq('AAA------------TAA', rf_table=(0, 3)) 
>>> print(p.get_full_rf_table())
[0, 3.0, 6.0, 9.0, 12.0, 15]
Instance Methods [hide private]
 
__getitem__(self, index)
Returns a subsequence of single letter, use my_seq[index].
source code
 
__init__(self, data='', alphabet=CodonAlphabet(Standard), gap_char='-', rf_table=None)
Create a Seq object.
source code
 
full_translate(self, codon_table=default_codon_table, stop_symbol='*')
Apply full translation with gaps considered.
source code
 
get_codon(self, index)
get the `index`-th codon from the self.seq
source code
 
get_codon_num(self)
Return the number of codons in the CodonSeq
source code
 
get_full_rf_table(self)
This function returns a full rf_table of the given CodonSeq records.
source code
 
toSeq(self, alphabet=DNAAlphabet()) source code
 
translate(self, codon_table=default_codon_table, stop_symbol='*', rf_table=None, ungap_seq=True)
Translate the CodonSeq based on the reading frame in rf_table.
source code
 
ungap(self, gap=None)
Return a copy of the sequence without the gap character(s).
source code

Inherited from Seq.Seq: __add__, __cmp__, __contains__, __hash__, __len__, __radd__, __repr__, __str__, back_transcribe, complement, count, endswith, find, lower, lstrip, reverse_complement, rfind, rsplit, rstrip, split, startswith, strip, tomutable, tostring, transcribe, upper

Inherited from Seq.Seq (private): _get_seq_str_and_check_alphabet

Inherited from object: __delattr__, __format__, __getattribute__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Class Methods [hide private]
 
from_seq(cls, seq, alphabet=CodonAlphabet(Standard), rf_table=None) source code
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__getitem__(self, index)
(Indexing operator)

source code 

Returns a subsequence of single letter, use my_seq[index].

Overrides: Seq.Seq.__getitem__
(inherited documentation)

__init__(self, data='', alphabet=CodonAlphabet(Standard), gap_char='-', rf_table=None)
(Constructor)

source code 

Create a Seq object.

Arguments:

  • seq - Sequence, required (string)
  • alphabet - Optional argument, an Alphabet object from Bio.Alphabet

You will typically use Bio.SeqIO to read in sequences from files as SeqRecord objects, whose sequence will be exposed as a Seq object via the seq property.

However, will often want to create your own Seq objects directly:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF",
...              IUPAC.protein)
>>> my_seq
Seq('MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF', IUPACProtein())
>>> print(my_seq)
MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF
>>> my_seq.alphabet
IUPACProtein()
Overrides: object.__init__
(inherited documentation)

get_full_rf_table(self)

source code 

This function returns a full rf_table of the given CodonSeq records. A full rf_table is different from normal rf_table in that it translate gaps in CodonSeq. It is helpful to construct alignment containing frameshift.

translate(self, codon_table=default_codon_table, stop_symbol='*', rf_table=None, ungap_seq=True)

source code 

Translate the CodonSeq based on the reading frame in rf_table. It is possible for the user to specify a rf_table at this point. If you want to include gaps in the translated sequence, this is the only way. ungap_seq should be set to true for this purpose.

Overrides: Seq.Seq.translate

ungap(self, gap=None)

source code 

Return a copy of the sequence without the gap character(s).

The gap character can be specified in two ways - either as an explicit argument, or via the sequence's alphabet. For example:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("-ATA--TGAAAT-TTGAAAA", generic_dna)
>>> my_dna
Seq('-ATA--TGAAAT-TTGAAAA', DNAAlphabet())
>>> my_dna.ungap("-")
Seq('ATATGAAATTTGAAAA', DNAAlphabet())

If the gap character is not given as an argument, it will be taken from the sequence's alphabet (if defined). Notice that the returned sequence's alphabet is adjusted since it no longer requires a gapped alphabet:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC, Gapped, HasStopCodon
>>> my_pro = Seq("MVVLE=AD*", HasStopCodon(Gapped(IUPAC.protein, "=")))
>>> my_pro
Seq('MVVLE=AD*', HasStopCodon(Gapped(IUPACProtein(), '='), '*'))
>>> my_pro.ungap()
Seq('MVVLEAD*', HasStopCodon(IUPACProtein(), '*'))

Or, with a simpler gapped DNA example:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC, Gapped
>>> my_seq = Seq("CGGGTAG=AAAAAA", Gapped(IUPAC.unambiguous_dna, "="))
>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap()
Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())

As long as it is consistent with the alphabet, although it is redundant, you can still supply the gap character as an argument to this method:

>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap("=")
Seq('CGGGTAGAAAAAA', IUPACUnambiguousDNA())

However, if the gap character given as the argument disagrees with that declared in the alphabet, an exception is raised:

>>> my_seq
Seq('CGGGTAG=AAAAAA', Gapped(IUPACUnambiguousDNA(), '='))
>>> my_seq.ungap("-")
Traceback (most recent call last):
   ...
ValueError: Gap '-' does not match '=' from alphabet

Finally, if a gap character is not supplied, and the alphabet does not define one, an exception is raised:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> my_dna = Seq("ATA--TGAAAT-TTGAAAA", generic_dna)
>>> my_dna
Seq('ATA--TGAAAT-TTGAAAA', DNAAlphabet())
>>> my_dna.ungap()
Traceback (most recent call last):
   ...
ValueError: Gap character not given and not defined in alphabet
Overrides: Seq.Seq.ungap
(inherited documentation)