Protein sequences can be analysed by several tools, based on the
ProtParam tools on the Expasy Proteomics Server. The module is part of
the SeqUtils
package.
The ProteinAnalysis
class takes one argument, the protein sequence as a
string and builds a sequence object using the Bio.Seq module
. This is
done just to make sure the sequence is a protein sequence and not
anything else.
>>> from Bio.SeqUtils.ProtParam import ProteinAnalysis
>>> my_seq = (
... "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFLRILPDGTVDGTRDRSDQHIQLQ"
... "LSAESVGEVYIKSTETGQYLAMDTSGLLYGSQTPSEECLFLERLEENHYNTYTSKKHAKN"
... "WFVGLKKNGSCKRGPRTHYGQKAILFLPLPV"
... )
>>> analysed_seq = ProteinAnalysis(my_seq)
>>> analysed_seq.molecular_weight()
17103.1617
>>> analysed_seq.gravy()
-0.597368421052632
>>> analysed_seq.count_amino_acids()
{'A': 6, 'C': 3, 'E': 12, 'D': 5, 'G': 14, 'F': 6, 'I': 5, 'H': 5, 'K': 12, 'M':
2, 'L': 18, 'N': 7, 'Q': 6, 'P': 8, 'S': 10, 'R': 6, 'T': 13, 'W': 1, 'V': 5,
'Y': 8}
>>>
count_amino_acids
: Simply counts the number times an amino acid is repeated
in the protein sequence. Returns a dictionary {AminoAcid: Number} and also
stores the dictionary in self.amino_acids_content
.get_amino_acids_percent
: The same as count_amino_acids
, only returns the
number in percentage of entire sequence. Returns a dictionary and stores the
dictionary in self.amino_acids_content_percent
.molecular_weight
: Calculates the molecular weight of a protein.aromaticity
: Calculates the aromaticity value of a protein according to Lobry &
Gautier (1994, Nucleic Acids Res., 22, 3174-3180).
It is simply the relative frequency of Phe+Trp+Tyr.instability_index
: Implementation of the method of Guruprasad et al.
(1990, Protein Engineering, 4, 155-161).
This method tests a protein for stability. Any value above 40 means the protein
is unstable (=has a short half life).flexibility
: Implementation of the flexibility method of Vihinen et al.
(1994, Proteins, 19, 141-149).isoelectric_point
: This method uses the module IsoelectricPoint
to calculate
the pI of a protein.secondary_structure_fraction
: This methods returns a list of the fraction
of amino acids which tend to be in helix, turn or sheet.
The list contains 3 values: [Helix, Turn, Sheet].
protein_scale(Scale, WindowSize, Edge)
:
The method returns a list of values which can be plotted to view the change
along a protein sequence. You can set several parameters that control the
computation of a scale profile, such as the window size and the
window edge relative weight value. Many scales exist. Just add your favorites
to the ProtParamData
modules.