Bio.Alphabet.Reduced module¶
Reduced alphabets which lump together several amino-acids into one letter.
Reduced (redundant or simplified) alphabets are used to represent protein sequences using an alternative alphabet which lumps together several amino-acids into one letter, based on physico-chemical traits. For example, all the aliphatics (I,L,V) are usually quite interchangeable, so many sequence studies group them into one letter
Examples of reduced alphabets are available in:
http://viscose.herokuapp.com/html/alphabets.html
The Murphy tables are from here:
Murphy L.R., Wallqvist A, Levy RM. (2000) Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng. 13(3):149-152
These alphabets have been used with Bio.utils.reduce_sequence, which has been removed from Biopython. You can use this is alphabets and tables like this:
>>> from Bio.Seq import Seq
>>> from Bio import Alphabet
>>> from Bio.Alphabet import Reduced
>>> my_protein = Seq('MAGSKEWKRFCELTINEA', Alphabet.ProteinAlphabet())
Now, we convert this sequence into a sequence which only recognizes polar (P) or hydrophobic (H) residues:
>>> new_protein = Seq('', Alphabet.Reduced.HPModel())
>>> for aa in my_protein:
... new_protein += Alphabet.Reduced.hp_model_tab[aa]
>>> new_protein
Seq('HPPPPPHPPHHPHPHPPP', HPModel())
The following Alphabet classes are available:
- Murphy15: Maps 20 amino acids to 15, use murphy_15_tab for conversion,
ambiguous letters: L: LVIM, F: FY, K: KR
- Murphy10: Maps 20 amino acids to 10, use murphy_10_tab for conversion,
ambiguous letters: L: LVIM, S: ST, F: FYW, E: EDNQ, K: KR
- Murphy8: Maps 20 amino acids to 8, use murphy_8_tab for conversion,
ambiguous letters: L: LVIMC, A: AG, S: ST, F: FYW, E: EDNQ, K: KR
- Murphy4: Maps 20 amino acids to 4, use murphy_4_tab for conversion,
ambiguous letters: L: LVIMC, A: AGSTP, F: FYW, E: EDNQKRH
- HPModel: Groups amino acids as polar (hydrophilic) or hydrophobic
(non-polar), use hp_model_tab for conversion, P: AGTSNQDEHRKP, H: CMFILVWY
- PC5: Amino acids grouped according to 5 physico-chemical properties,
use pc_5_table for conversion, A (Aliphatic): IVL, R (aRomatic): FYWH, C (Charged): KRDE, T (Tiny): GACS, D (Diverse): TMQNP
-
class
Bio.Alphabet.Reduced.
Murphy15
¶ Bases:
Bio.Alphabet.ProteinAlphabet
Reduced protein alphabet with 15 letters.
- Letters: A, C, D, E, G, H, N, P, Q, S, T, W,
L(LVIM), F(FY), K(KR)
-
letters
= 'LCAGSTPFWEDNQKH'¶
-
size
= 1¶
-
class
Bio.Alphabet.Reduced.
Murphy10
¶ Bases:
Bio.Alphabet.ProteinAlphabet
Reduced protein alphabet with 10 letters.
- Letters: A, C, G, H, P, L(LVIM), S(ST), F(FYW),
E(EDNQ), K(KR)
-
letters
= 'LCAGSPFEKH'¶
-
size
= 1¶
-
class
Bio.Alphabet.Reduced.
Murphy8
¶ Bases:
Bio.Alphabet.ProteinAlphabet
Reduced protein alphabet with 8 letters.
- Letters: H, P, L(LVIMC), A(AG), S(ST), F(FYW),
E(EDNQ), K(KR)
-
letters
= 'LASPFEKH'¶
-
size
= 1¶
-
class
Bio.Alphabet.Reduced.
Murphy4
¶ Bases:
Bio.Alphabet.ProteinAlphabet
Reduced protein alphabet with 4 letters.
Letters: L(LVIMC), A(AGSTP), F(FYW), E(EDNQKRH)
-
letters
= 'LAFE'¶
-
size
= 1¶
-
-
class
Bio.Alphabet.Reduced.
HPModel
¶ Bases:
Bio.Alphabet.ProteinAlphabet
Reduced protein alphabet with only two letters for polar or hydophobic.
Letters: P (polar: AGTSNQDEHRKP), H (hydrophobic: CMFILVWY)
-
letters
= 'HP'¶
-
size
= 1¶
-
-
class
Bio.Alphabet.Reduced.
PC5
¶ Bases:
Bio.Alphabet.ProteinAlphabet
Reduced protein alphabet with 5 letters for physico-chemical properties.
- Letters: A (Aliphatic: IVL), R (aRomatic: FYWH), C (Charged: KRDE),
T (Tiny: GACS), D (Diverse: TMQNP)
-
letters
= 'ARCTD'¶
-
size
= 1¶