Bio.SwissProt package
Submodules
Module contents
Code to work with the sprotXX.dat file from SwissProt.
https://web.expasy.org/docs/userman.html
- Classes:
Record Holds SwissProt data.
Reference Holds reference data from a SwissProt record.
- Functions:
read Read one SwissProt record
parse Read multiple SwissProt records
- exception Bio.SwissProt.SwissProtParserError(*args, line=None)
Bases:
ValueError
An error occurred while parsing a SwissProt file.
- __init__(*args, line=None)
Create a SwissProtParserError object with the offending line.
- class Bio.SwissProt.Record
Bases:
object
Holds information from a SwissProt record.
- Attributes:
entry_name Name of this entry, e.g. RL1_ECOLI.
data_class Either ‘STANDARD’ or ‘PRELIMINARY’.
molecule_type Type of molecule, ‘PRT’,
sequence_length Number of residues.
accessions List of the accession numbers, e.g. [‘P00321’]
created A tuple of (date, release).
sequence_update A tuple of (date, release).
annotation_update A tuple of (date, release).
description Free-format description.
- gene_name A list of dictionaries with keys ‘Name’, ‘Synonyms’,
‘OrderedLocusNames’ and ‘ORFNames’.
organism The source of the sequence.
organelle The origin of the sequence.
organism_classification The taxonomy classification. List of strings. (http://www.ncbi.nlm.nih.gov/Taxonomy/)
taxonomy_id A list of NCBI taxonomy id’s.
host_organism A list of names of the hosts of a virus, if any.
host_taxonomy_id A list of NCBI taxonomy id’s of the hosts, if any.
references List of Reference objects.
comments List of strings.
cross_references List of tuples (db, id1[, id2][, id3]). See the docs.
keywords List of the keywords.
features List of tuples (key name, from, to, description). from and to can be either integers for the residue numbers, ‘<’, ‘>’, or ‘?’
protein_existence Numerical value describing the evidence for the existence of the protein.
seqinfo tuple of (length, molecular weight, CRC32 value)
sequence The sequence.
Examples
>>> from Bio import SwissProt >>> example_filename = "SwissProt/P68308.txt" >>> with open(example_filename) as handle: ... records = SwissProt.parse(handle) ... for record in records: ... print(record.entry_name) ... print(record.accessions) ... print(record.keywords) ... print(record.organism) ... print(record.sequence[:20] + "...") ... NU3M_BALPH ['P68308', 'P24973'] ['Electron transport', 'Membrane', 'Mitochondrion', 'Mitochondrion inner membrane', 'NAD', 'Respiratory chain', 'Translocase', 'Transmembrane', 'Transmembrane helix', 'Transport', 'Ubiquinone'] Balaenoptera physalus (Fin whale) (Balaena physalus). MNLLLTLLTNTTLALLLVFI...
- __init__()
Initialize the class.
- class Bio.SwissProt.Reference
Bases:
object
Holds information from one reference in a SwissProt entry.
- Attributes:
number Number of reference in an entry.
evidence Evidence code. List of strings.
positions Describes extent of work. List of strings.
comments Comments. List of (token, text).
references References. List of (dbname, identifier).
authors The authors of the work.
title Title of the work.
location A citation for the work.
- __init__()
Initialize the class.
- class Bio.SwissProt.FeatureTable(location=None, type='', id='<unknown id>', qualifiers=None, sub_features=None)
Bases:
SeqFeature
Stores feature annotations for specific regions of the sequence.
This is a subclass of SeqFeature, defined in Bio.SeqFeature, where the attributes are used as follows:
location
: location of the feature on the canonical or isoform sequence; the location is stored as an instance of SimpleLocation, defined in Bio.SeqFeature, with the ref attribute set to the isoform ID referring to the canonical or isoform sequence on which the feature is definedid
: unique and stable identifier (FTId), only provided for features belonging to the types CARBOHYD, CHAIN, PEPTIDE, PROPEP, VARIANT, or VAR_SEQtype
: indicates the type of feature, as defined by the UniProt Knowledgebase documentation:ACT_SITE: amino acid(s) involved in the activity of an enzyme
BINDING: binding site for any chemical group
CARBOHYD: glycosylation site; an FTId identifier to the GlyConnect database is provided if annotated there
CA_BIND: calcium-binding region
CHAIN: polypeptide chain in the mature protein
COILED: coiled-coil region
COMPBIAS: compositionally biased region
CONFLICT: different sources report differing sequences
CROSSLNK: posttransationally formed amino acid bond
DISULFID: disulfide bond
DNA_BIND: DNA-binding region
DOMAIN: domain, defined as a specific combination of secondary structures organized into a characteristic three-dimensional structure or fold
INIT_MET: initiator methionine
INTRAMEM: region located in a membrane without crossing it
HELIX: alpha-, 3(10)-, or pi-helix secondary structure
LIPID: covalent binding of a lipid moiety
METAL: binding site for a metal ion
MOD_RES: posttranslational modification (PTM) of a residue, annotated by the controlled vocabulary defined by the ptmlist.txt document on the UniProt website
MOTIF: short sequence motif of biological interest
MUTAGEN: site experimentally altered by mutagenesis
NON_CONS: non-consecutive residues
NON_STD: non-standard amino acid
NON_TER: the residue at an extremity of the sequence is not the terminal residue
NP_BIND: nucleotide phosphate-binding region
PEPTIDE: released active mature polypeptide
PROPEP: any processed propeptide
REGION: region of interest in the sequence
REPEAT: internal sequence repetition
SIGNAL: signal sequence (prepeptide)
SITE: amino-acid site of interest not represented by another feature key
STRAND: beta-strand secondary structure; either a hydrogen-bonded extended beta strand or a residue in an isolated beta-bridge
TOPO_DOM: topological domain
TRANSIT: transit peptide (mitochondrion, chloroplast, thylakoid, cyanelle, peroxisome, etc.)
TRANSMEM: transmembrane region
TURN: H-bonded turn (3-, 4-, or 5-turn)
UNSURE: uncertainties in the sequence
VARIANT: sequence variant; an FTId is provided for protein sequence variants of Hominidae (great apes and humans)
VAR_SEQ: sequence variant produced by alternative splicing, alternative promoter usage, alternative initiation, or ribosomal frameshifting
ZN_FING: zinc finger region
qualifiers
: a dictionary of additional information, which may include the feature evidence and free-text notes. While SwissProt includes the feature identifier code (FTId) as a qualifier, it is stored as the attribute ID of the FeatureTable object.
- Bio.SwissProt.parse(source)
Read multiple SwissProt records from file.
Argument source is a file-like object or a path to a file.
Returns a generator object which yields Bio.SwissProt.Record() objects.
- Bio.SwissProt.read(source)
Read one SwissProt record from file.
Argument source is a file-like object or a path to a file.
Returns a Record() object.