Bio.motifs.jaspar.db module
Provides read access to a JASPAR5 formatted database.
This modules requires MySQLdb to be installed.
Example, substitute the your database credentials as appropriate:
from Bio.motifs.jaspar.db import JASPAR5
JASPAR_DB_HOST = "hostname.example.org"
JASPAR_DB_NAME = "JASPAR2018"
JASPAR_DB_USER = "guest"
JASPAR_DB_PASS = "guest"
jdb = JASPAR5(
host=JASPAR_DB_HOST,
name=JASPAR_DB_NAME,
user=JASPAR_DB_USER,
password=JASPAR_DB_PASS
)
ets1 = jdb.fetch_motif_by_id('MA0098')
print(ets1)
TF name ETS1
Matrix ID MA0098.3
Collection CORE
TF class ['Tryptophan cluster factors']
TF family ['Ets-related factors']
Species 9606
Taxonomic group vertebrates
Accession ['P14921']
Data type used HT-SELEX
Medline 20517297
PAZAR ID TF0000070
Comments Data is from Taipale HTSELEX DBD (2013)
Matrix:
0 1 2 3 4 5 6 7 8 9
A: 2683.00 180.00 425.00 0.00 0.00 2683.00 2683.00 1102.00 89.00 803.00
C: 210.00 2683.00 2683.00 21.00 0.00 0.00 9.00 21.00 712.00 401.00
G: 640.00 297.00 7.00 2683.00 2683.00 0.00 31.00 1580.00 124.00 1083.00
T: 241.00 22.00 0.00 0.00 12.00 0.00 909.00 12.00 1970.00 396.00
motifs = jdb.fetch_motifs(
collection = 'CORE',
tax_group = ['vertebrates', 'insects'],
tf_class = 'Homeo domain factors',
tf_family = ['TALE-type homeo domain factors', 'POU domain factors'],
min_ic = 12
)
for motif in motifs:
pass # do something with the motif
- class Bio.motifs.jaspar.db.JASPAR5(host=None, name=None, user=None, password=None)
Bases:
object
Class representing a JASPAR5 database.
Class representing a JASPAR5 DB. The methods within are loosely based on the perl TFBS::DB::JASPAR5 module.
Note: We will only implement reading of JASPAR motifs from the DB. Unlike the perl module, we will not attempt to implement any methods to store JASPAR motifs or create a new DB at this time.
- __init__(host=None, name=None, user=None, password=None)
Construct a JASPAR5 instance and connect to specified DB.
- Arguments:
host - host name of the the JASPAR DB server
name - name of the JASPAR database
user - user name to connect to the JASPAR DB
password - JASPAR DB password
- __str__()
Return a string representation of the JASPAR5 DB connection.
- fetch_motif_by_id(id)
Fetch a single JASPAR motif from the DB by its JASPAR matrix ID.
Example id ‘MA0001.1’.
- Arguments:
- id - JASPAR matrix ID. This may be a fully specified ID including
the version number (e.g. MA0049.2) or just the base ID (e.g. MA0049). If only a base ID is provided, the latest version is returned.
- Returns:
A Bio.motifs.jaspar.Motif object
NOTE: The perl TFBS module allows you to specify the type of matrix to return (PFM, PWM, ICM) but matrices are always stored in JASPAR as PFMs so this does not really belong here. Once a PFM is fetched the pwm() and pssm() methods can be called to return the normalized and log-odds matrices.
- fetch_motifs_by_name(name)
Fetch a list of JASPAR motifs from a JASPAR DB by the given TF name(s).
Arguments: name - a single name or list of names Returns: A list of Bio.motifs.jaspar.Motif objects
Notes: Names are not guaranteed to be unique. There may be more than one motif with the same name. Therefore even if name specifies a single name, a list of motifs is returned. This just calls self.fetch_motifs(collection = None, tf_name = name).
This behaviour is different from the TFBS perl module’s get_Matrix_by_name() method which always returns a single matrix, issuing a warning message and returning the first matrix retrieved in the case where multiple matrices have the same name.
- fetch_motifs(collection=JASPAR_DFLT_COLLECTION, tf_name=None, tf_class=None, tf_family=None, matrix_id=None, tax_group=None, species=None, pazar_id=None, data_type=None, medline=None, min_ic=0, min_length=0, min_sites=0, all=False, all_versions=False)
Fetch jaspar.Record (list) of motifs using selection criteria.
Arguments:
Except where obvious, all selection criteria arguments may be specified as a single value or a list of values. Motifs must meet ALL the specified selection criteria to be returned with the precedent exceptions noted below. all - Takes precedent of all other selection criteria. Every motif is returned. If 'all_versions' is also specified, all versions of every motif are returned, otherwise just the latest version of every motif is returned. matrix_id - Takes precedence over all other selection criteria except 'all'. Only motifs with the given JASPAR matrix ID(s) are returned. A matrix ID may be specified as just a base ID or full JASPAR IDs including version number. If only a base ID is provided for specific motif(s), then just the latest version of those motif(s) are returned unless 'all_versions' is also specified. collection - Only motifs from the specified JASPAR collection(s) are returned. NOTE - if not specified, the collection defaults to CORE for all other selection criteria except 'all' and 'matrix_id'. To apply the other selection criteria across all JASPAR collections, explicitly set collection=None. tf_name - Only motifs with the given name(s) are returned. tf_class - Only motifs of the given TF class(es) are returned. tf_family - Only motifs from the given TF families are returned. tax_group - Only motifs belonging to the given taxonomic supergroups are returned (e.g. 'vertebrates', 'insects', 'nematodes' etc.) species - Only motifs derived from the given species are returned. Species are specified as taxonomy IDs. data_type - Only motifs generated with the given data type (e.g. ('ChIP-seq', 'PBM', 'SELEX' etc.) are returned. NOTE - must match exactly as stored in the database. pazar_id - Only motifs with the given PAZAR TF ID are returned. medline - Only motifs with the given medline (PubmMed IDs) are returned. min_ic - Only motifs whose profile matrices have at least this information content (specificty) are returned. min_length - Only motifs whose profiles are of at least this length are returned. min_sites - Only motifs compiled from at least these many binding sites are returned. all_versions- Unless specified, just the latest version of motifs determined by the other selection criteria are returned. Otherwise all versions of the selected motifs are returned.
- Returns:
A Bio.motifs.jaspar.Record (list) of motifs.