Bio.Alphabet.IUPAC module

Standard nucleotide and protein alphabets defined by IUPAC.

class Bio.Alphabet.IUPAC.ExtendedIUPACProtein

Bases: Bio.Alphabet.ProteinAlphabet

Extended uppercase IUPAC protein single letter alphabet including X etc.

In addition to the standard 20 single letter protein codes, this includes:

  • B = “Asx”; Aspartic acid (R) or Asparagine (N)

  • X = “Xxx”; Unknown or ‘other’ amino acid

  • Z = “Glx”; Glutamic acid (E) or Glutamine (Q)

  • J = “Xle”; Leucine (L) or Isoleucine (I), used in mass-spec (NMR)

  • U = “Sec”; Selenocysteine

  • O = “Pyl”; Pyrrolysine

This alphabet is not intended to be used with X for Selenocysteine (an ad-hoc standard prior to the IUPAC adoption of U instead).

letters = 'ACDEFGHIKLMNPQRSTVWYBXZJUO'
class Bio.Alphabet.IUPAC.IUPACProtein

Bases: Bio.Alphabet.IUPAC.ExtendedIUPACProtein

IUPAC protein alphabet of the 20 standard amino acids.

Uppercase and single letter.

letters = 'ACDEFGHIKLMNPQRSTVWY'
class Bio.Alphabet.IUPAC.IUPACAmbiguousDNA

Bases: Bio.Alphabet.DNAAlphabet

Uppercase IUPAC ambiguous DNA.

letters = 'GATCRYWSMKHBVDN'
class Bio.Alphabet.IUPAC.IUPACUnambiguousDNA

Bases: Bio.Alphabet.IUPAC.IUPACAmbiguousDNA

Uppercase IUPAC unambiguous DNA (letters GATC only).

letters = 'GATC'
class Bio.Alphabet.IUPAC.ExtendedIUPACDNA

Bases: Bio.Alphabet.DNAAlphabet

Extended IUPAC DNA alphabet.

In addition to the standard letter codes GATC, this includes:

  • B = 5-bromouridine

  • D = 5,6-dihydrouridine

  • S = thiouridine

  • W = wyosine

letters = 'GATCBDSW'
class Bio.Alphabet.IUPAC.IUPACAmbiguousRNA

Bases: Bio.Alphabet.RNAAlphabet

Uppercase IUPAC ambiguous RNA.

letters = 'GAUCRYWSMKHBVDN'
class Bio.Alphabet.IUPAC.IUPACUnambiguousRNA

Bases: Bio.Alphabet.IUPAC.IUPACAmbiguousRNA

Uppercase IUPAC unambiguous RNA (letters GAUC only).

letters = 'GAUC'