Bio.Alphabet.Reduced module

Reduced alphabets which lump together several amino-acids into one letter.

Reduced (redundant or simplified) alphabets are used to represent protein sequences using an alternative alphabet which lumps together several amino-acids into one letter, based on physico-chemical traits. For example, all the aliphatics (I,L,V) are usually quite interchangeable, so many sequence studies group them into one letter

Examples of reduced alphabets are available in:

http://viscose.herokuapp.com/html/alphabets.html

The Murphy tables are from here:

Murphy L.R., Wallqvist A, Levy RM. (2000) Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng. 13(3):149-152

These alphabets have been used with Bio.utils.reduce_sequence, which has been removed from Biopython. You can use this is alphabets and tables like this:

>>> from Bio.Seq import Seq
>>> from Bio import Alphabet
>>> from Bio.Alphabet import Reduced
>>> my_protein = Seq('MAGSKEWKRFCELTINEA', Alphabet.ProteinAlphabet())

Now, we convert this sequence into a sequence which only recognizes polar (P) or hydrophobic (H) residues:

>>> new_protein = Seq('', Alphabet.Reduced.HPModel())
>>> for aa in my_protein:
...     new_protein += Alphabet.Reduced.hp_model_tab[aa]
>>> new_protein
Seq('HPPPPPHPPHHPHPHPPP', HPModel())

The following Alphabet classes are available:

  • Murphy15: Maps 20 amino acids to 15, use murphy_15_tab for conversion,

    ambiguous letters: L: LVIM, F: FY, K: KR

  • Murphy10: Maps 20 amino acids to 10, use murphy_10_tab for conversion,

    ambiguous letters: L: LVIM, S: ST, F: FYW, E: EDNQ, K: KR

  • Murphy8: Maps 20 amino acids to 8, use murphy_8_tab for conversion,

    ambiguous letters: L: LVIMC, A: AG, S: ST, F: FYW, E: EDNQ, K: KR

  • Murphy4: Maps 20 amino acids to 4, use murphy_4_tab for conversion,

    ambiguous letters: L: LVIMC, A: AGSTP, F: FYW, E: EDNQKRH

  • HPModel: Groups amino acids as polar (hydrophilic) or hydrophobic

    (non-polar), use hp_model_tab for conversion, P: AGTSNQDEHRKP, H: CMFILVWY

  • PC5: Amino acids grouped according to 5 physico-chemical properties,

    use pc_5_table for conversion, A (Aliphatic): IVL, R (aRomatic): FYWH, C (Charged): KRDE, T (Tiny): GACS, D (Diverse): TMQNP

class Bio.Alphabet.Reduced.Murphy15

Bases: Bio.Alphabet.ProteinAlphabet

Reduced protein alphabet with 15 letters.

Letters: A, C, D, E, G, H, N, P, Q, S, T, W,

L(LVIM), F(FY), K(KR)

letters = 'LCAGSTPFWEDNQKH'
size = 1
class Bio.Alphabet.Reduced.Murphy10

Bases: Bio.Alphabet.ProteinAlphabet

Reduced protein alphabet with 10 letters.

Letters: A, C, G, H, P, L(LVIM), S(ST), F(FYW),

E(EDNQ), K(KR)

letters = 'LCAGSPFEKH'
size = 1
class Bio.Alphabet.Reduced.Murphy8

Bases: Bio.Alphabet.ProteinAlphabet

Reduced protein alphabet with 8 letters.

Letters: H, P, L(LVIMC), A(AG), S(ST), F(FYW),

E(EDNQ), K(KR)

letters = 'LASPFEKH'
size = 1
class Bio.Alphabet.Reduced.Murphy4

Bases: Bio.Alphabet.ProteinAlphabet

Reduced protein alphabet with 4 letters.

Letters: L(LVIMC), A(AGSTP), F(FYW), E(EDNQKRH)

letters = 'LAFE'
size = 1
class Bio.Alphabet.Reduced.HPModel

Bases: Bio.Alphabet.ProteinAlphabet

Reduced protein alphabet with only two letters for polar or hydophobic.

Letters: P (polar: AGTSNQDEHRKP), H (hydrophobic: CMFILVWY)

letters = 'HP'
size = 1
class Bio.Alphabet.Reduced.PC5

Bases: Bio.Alphabet.ProteinAlphabet

Reduced protein alphabet with 5 letters for physico-chemical properties.

Letters: A (Aliphatic: IVL), R (aRomatic: FYWH), C (Charged: KRDE),

T (Tiny: GACS), D (Diverse: TMQNP)

letters = 'ARCTD'
size = 1