Bio.SwissProt package

Submodules

Bio.SwissProt.KeyWList module
- Record
  - Record.__init__()
- parse()

Module contents

Code to work with the sprotXX.dat file from SwissProt.

https://web.expasy.org/docs/userman.html

Classes:

Record Holds SwissProt data.
Reference Holds reference data from a SwissProt record.

Functions:

read Read one SwissProt record
parse Read multiple SwissProt records

exception Bio.SwissProt.SwissProtParserError(*args, line=None)

Bases: ValueError

An error occurred while parsing a SwissProt file.

__init__(*args, line=None): Create a SwissProtParserError object with the offending line.

class Bio.SwissProt.Record

Bases: object

Holds information from a SwissProt record.

Attributes:

entry_name Name of this entry, e.g. RL1_ECOLI.
data_class Either ‘STANDARD’ or ‘PRELIMINARY’.
molecule_type Type of molecule, ‘PRT’,
sequence_length Number of residues.
accessions List of the accession numbers, e.g. [‘P00321’]
created A tuple of (date, release).
sequence_update A tuple of (date, release).
annotation_update A tuple of (date, release).
description Free-format description.
gene_name A list of dictionaries with keys ‘Name’, ‘Synonyms’,
‘OrderedLocusNames’ and ‘ORFNames’.
organism The source of the sequence.
organelle The origin of the sequence.
organism_classification The taxonomy classification. List of strings. (http://www.ncbi.nlm.nih.gov/Taxonomy/)
taxonomy_id A list of NCBI taxonomy id’s.
host_organism A list of names of the hosts of a virus, if any.
host_taxonomy_id A list of NCBI taxonomy id’s of the hosts, if any.
references List of Reference objects.
comments List of strings.
cross_references List of tuples (db, id1[, id2][, id3]). See the docs.
keywords List of the keywords.
features List of tuples (key name, from, to, description). from and to can be either integers for the residue numbers, ‘<’, ‘>’, or ‘?’
protein_existence Numerical value describing the evidence for the existence of the protein.
seqinfo tuple of (length, molecular weight, CRC32 value)
sequence The sequence.

Examples

>>> from Bio import SwissProt
>>> example_filename = "SwissProt/P68308.txt"
>>> with open(example_filename) as handle:
...     records = SwissProt.parse(handle)
...     for record in records:
...         print(record.entry_name)
...         print(record.accessions)
...         print(record.keywords)
...         print(record.organism)
...         print(record.sequence[:20] + "...")
...
NU3M_BALPH
['P68308', 'P24973']
['Electron transport', 'Membrane', 'Mitochondrion', 'Mitochondrion inner membrane', 'NAD', 'Respiratory chain', 'Translocase', 'Transmembrane', 'Transmembrane helix', 'Transport', 'Ubiquinone']
Balaenoptera physalus (Fin whale) (Balaena physalus).
MNLLLTLLTNTTLALLLVFI...

__init__(): Initialize the class.

class Bio.SwissProt.Reference

Bases: object

Holds information from one reference in a SwissProt entry.

Attributes:

number Number of reference in an entry.
evidence Evidence code. List of strings.
positions Describes extent of work. List of strings.
comments Comments. List of (token, text).
references References. List of (dbname, identifier).
authors The authors of the work.
title Title of the work.
location A citation for the work.

__init__(): Initialize the class.

class Bio.SwissProt.FeatureTable(location=None, type='', id='<unknown id>', qualifiers=None, sub_features=None)

Bases: SeqFeature

Stores feature annotations for specific regions of the sequence.

This is a subclass of SeqFeature, defined in Bio.SeqFeature, where the attributes are used as follows:

location: location of the feature on the canonical or isoform sequence; the location is stored as an instance of SimpleLocation, defined in Bio.SeqFeature, with the ref attribute set to the isoform ID referring to the canonical or isoform sequence on which the feature is defined

id: unique and stable identifier (FTId), only provided for features belonging to the types CARBOHYD, CHAIN, PEPTIDE, PROPEP, VARIANT, or VAR_SEQ

type: indicates the type of feature, as defined by the UniProt Knowledgebase documentation:

ACT_SITE: amino acid(s) involved in the activity of an enzyme

BINDING: binding site for any chemical group

CARBOHYD: glycosylation site; an FTId identifier to the GlyConnect database is provided if annotated there

CA_BIND: calcium-binding region

CHAIN: polypeptide chain in the mature protein

COILED: coiled-coil region

COMPBIAS: compositionally biased region

CONFLICT: different sources report differing sequences

CROSSLNK: posttransationally formed amino acid bond

DISULFID: disulfide bond

DNA_BIND: DNA-binding region

DOMAIN: domain, defined as a specific combination of secondary structures organized into a characteristic three-dimensional structure or fold

INIT_MET: initiator methionine

INTRAMEM: region located in a membrane without crossing it

HELIX: alpha-, 3(10)-, or pi-helix secondary structure

LIPID: covalent binding of a lipid moiety

METAL: binding site for a metal ion

MOD_RES: posttranslational modification (PTM) of a residue, annotated by the controlled vocabulary defined by the ptmlist.txt document on the UniProt website

MOTIF: short sequence motif of biological interest

MUTAGEN: site experimentally altered by mutagenesis

NON_CONS: non-consecutive residues

NON_STD: non-standard amino acid

NON_TER: the residue at an extremity of the sequence is not the terminal residue

NP_BIND: nucleotide phosphate-binding region

PEPTIDE: released active mature polypeptide

PROPEP: any processed propeptide

REGION: region of interest in the sequence

REPEAT: internal sequence repetition

SIGNAL: signal sequence (prepeptide)

SITE: amino-acid site of interest not represented by another feature key

STRAND: beta-strand secondary structure; either a hydrogen-bonded extended beta strand or a residue in an isolated beta-bridge

TOPO_DOM: topological domain

TRANSIT: transit peptide (mitochondrion, chloroplast, thylakoid, cyanelle, peroxisome, etc.)

TRANSMEM: transmembrane region

TURN: H-bonded turn (3-, 4-, or 5-turn)

UNSURE: uncertainties in the sequence

VARIANT: sequence variant; an FTId is provided for protein sequence variants of Hominidae (great apes and humans)

VAR_SEQ: sequence variant produced by alternative splicing, alternative promoter usage, alternative initiation, or ribosomal frameshifting

ZN_FING: zinc finger region

qualifiers: a dictionary of additional information, which may include the feature evidence and free-text notes. While SwissProt includes the feature identifier code (FTId) as a qualifier, it is stored as the attribute ID of the FeatureTable object.

__annotations__ = {}

Bio.SwissProt.parse(source)

Read multiple SwissProt records from file.

Argument source is a file-like object or a path to a file.

Returns a generator object which yields Bio.SwissProt.Record() objects.

Bio.SwissProt.read(source)

Read one SwissProt record from file.

Argument source is a file-like object or a path to a file.

Returns a Record() object.