| Trees | Indices | Help |
|
|---|
|
|
Alphabets used in Seq objects etc to declare sequence type and letters. This is used by sequences which contain a finite number of similar words.
|
|||
| |||
|
|||
| Alphabet | |||
| SingleLetterAlphabet | |||
| ProteinAlphabet | |||
| NucleotideAlphabet | |||
| DNAAlphabet | |||
| RNAAlphabet | |||
| SecondaryStructure | |||
| ThreeLetterProtein | |||
| AlphabetEncoder | |||
| Gapped | |||
| HasStopCodon | |||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
Returns a common but often generic base alphabet object (PRIVATE). This throws away any AlphabetEncoder information, e.g. Gapped alphabets. Note that DNA+RNA -> Nucleotide, and Nucleotide+Protein-> generic single letter. These DO NOT raise an exception! |
Returns a common but often generic alphabet object (PRIVATE).
>>> from Bio.Alphabet import IUPAC
>>> _consensus_alphabet([IUPAC.extended_protein, IUPAC.protein])
ExtendedIUPACProtein()
>>> _consensus_alphabet([generic_protein, IUPAC.protein])
ProteinAlphabet()
Note that DNA+RNA -> Nucleotide, and Nucleotide+Protein-> generic single
letter. These DO NOT raise an exception!
>>> _consensus_alphabet([generic_dna, generic_nucleotide])
NucleotideAlphabet()
>>> _consensus_alphabet([generic_dna, generic_rna])
NucleotideAlphabet()
>>> _consensus_alphabet([generic_dna, generic_protein])
SingleLetterAlphabet()
>>> _consensus_alphabet([single_letter_alphabet, generic_protein])
SingleLetterAlphabet()
This is aware of Gapped and HasStopCodon and new letters added by
other AlphabetEncoders. This WILL raise an exception if more than
one gap character or stop symbol is present.
>>> from Bio.Alphabet import IUPAC
>>> _consensus_alphabet([Gapped(IUPAC.extended_protein), HasStopCodon(IUPAC.protein)])
HasStopCodon(Gapped(ExtendedIUPACProtein(), '-'), '*')
>>> _consensus_alphabet([Gapped(IUPAC.protein, "-"), Gapped(IUPAC.protein, "=")])
Traceback (most recent call last):
...
ValueError: More than one gap character present
>>> _consensus_alphabet([HasStopCodon(IUPAC.protein, "*"), HasStopCodon(IUPAC.protein, "+")])
Traceback (most recent call last):
...
ValueError: More than one stop symbol present
|
Returns True except for DNA+RNA or Nucleotide+Protein (PRIVATE). >>> _check_type_compatible([generic_dna, generic_nucleotide]) True >>> _check_type_compatible([generic_dna, generic_rna]) False >>> _check_type_compatible([generic_dna, generic_protein]) False >>> _check_type_compatible([single_letter_alphabet, generic_protein]) True This relies on the Alphabet subclassing hierarchy. It does not check things like gap characters or stop symbols. |
Check all letters in sequence are in the alphabet (PRIVATE).
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF",
... IUPAC.protein)
>>> _verify_alphabet(my_seq)
True
This example has an X, which is not in the IUPAC protein alphabet
(you should be using the IUPAC extended protein alphabet):
>>> bad_seq = Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVFX",
... IUPAC.protein)
>>> _verify_alphabet(bad_seq)
False
This replaces Bio.utils.verify_alphabet() since we are deprecating
that. Potentially this could be added to the Alphabet object, and
I would like it to be an option when creating a Seq object... but
that might slow things down.
|
| Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Tue Feb 5 17:59:44 2013 | http://epydoc.sourceforge.net |