Bio.motifs.jaspar.db module

Provides read access to a JASPAR5 formatted database.

This modules requires MySQLdb to be installed.

Example, substitute the your database credentials as appropriate:

    from Bio.motifs.jaspar.db import JASPAR5
    JASPAR_DB_HOST = "hostname.example.org"
    JASPAR_DB_NAME = "JASPAR2018"
    JASPAR_DB_USER = "guest"
    JASPAR_DB_PASS = "guest"

    jdb = JASPAR5(
        host=JASPAR_DB_HOST,
        name=JASPAR_DB_NAME,
        user=JASPAR_DB_USER,
        password=JASPAR_DB_PASS
    )
    ets1 = jdb.fetch_motif_by_id('MA0098')
    print(ets1)
TF name ETS1
Matrix ID   MA0098.3
Collection  CORE
TF class    Tryptophan cluster factors
TF family   Ets-related factors
Species 9606
Taxonomic group vertebrates
Accession   ['P14921']
Data type used  HT-SELEX
Medline 20517297
PAZAR ID    TF0000070
Comments    Data is from Taipale HTSELEX DBD (2013)
Matrix:
        0      1      2      3      4      5      6      7      8      9
A: 2683.00 180.00 425.00   0.00   0.00 2683.00 2683.00 1102.00  89.00 803.00
C: 210.00 2683.00 2683.00  21.00   0.00   0.00   9.00  21.00 712.00 401.00
G: 640.00 297.00   7.00 2683.00 2683.00   0.00  31.00 1580.00 124.00 1083.00
T: 241.00  22.00   0.00   0.00  12.00   0.00 909.00  12.00 1970.00 396.00

    motifs = jdb.fetch_motifs(
        collection = 'CORE',
        tax_group = ['vertebrates', 'insects'],
        tf_class = 'Homeo domain factors',
        tf_family = ['TALE-type homeo domain factors', 'POU domain factors'],
        min_ic = 12
    )
    for motif in motifs:
        pass # do something with the motif
class Bio.motifs.jaspar.db.JASPAR5(host=None, name=None, user=None, password=None)

Bases: object

Class representing a JASPAR5 database.

Class representing a JASPAR5 DB. The methods within are loosely based on the perl TFBS::DB::JASPAR5 module.

Note: We will only implement reading of JASPAR motifs from the DB. Unlike the perl module, we will not attempt to implement any methods to store JASPAR motifs or create a new DB at this time.

__init__(self, host=None, name=None, user=None, password=None)

Construct a JASPAR5 instance and connect to specified DB.

Arguments:

  • host - host name of the the JASPAR DB server

  • name - name of the JASPAR database

  • user - user name to connect to the JASPAR DB

  • password - JASPAR DB password

__str__(self)

Return a string represention of the JASPAR5 DB connection.

fetch_motif_by_id(self, id)

Fetch a single JASPAR motif from the DB by it’s JASPAR matrix ID.

Example id ‘MA0001.1’.

Arguments:

  • id - JASPAR matrix ID. This may be a fully specified ID including the version number (e.g. MA0049.2) or just the base ID (e.g. MA0049). If only a base ID is provided, the latest version is returned.

Returns:
  • A Bio.motifs.jaspar.Motif object

NOTE: The perl TFBS module allows you to specify the type of matrix to return (PFM, PWM, ICM) but matrices are always stored in JASPAR as PFMs so this does not really belong here. Once a PFM is fetched the pwm() and pssm() methods can be called to return the normalized and log-odds matrices.

fetch_motifs_by_name(self, name)

Fetch a list of JASPAR motifs from a JASPAR DB by the given TF name(s).

Arguments: name - a single name or list of names Returns: A list of Bio.motifs.jaspar.Motif objects

Notes: Names are not guaranteed to be unique. There may be more than one motif with the same name. Therefore even if name specifies a single name, a list of motifs is returned. This just calls self.fetch_motifs(collection = None, tf_name = name).

This behaviour is different from the TFBS perl module’s get_Matrix_by_name() method which always returns a single matrix, issuing a warning message and returning the first matrix retrieved in the case where multiple matrices have the same name.

fetch_motifs(self, collection='CORE', tf_name=None, tf_class=None, tf_family=None, matrix_id=None, tax_group=None, species=None, pazar_id=None, data_type=None, medline=None, min_ic=0, min_length=0, min_sites=0, all=False, all_versions=False)

Fetch jaspar.Record (list) of motifs using selection criteria.

Arguments:

Except where obvious, all selection criteria arguments may be
specified as a single value or a list of values. Motifs must
meet ALL the specified selection criteria to be returned with
the precedent exceptions noted below.

all         - Takes precedent of all other selection criteria.
              Every motif is returned. If 'all_versions' is also
              specified, all versions of every motif are returned,
              otherwise just the latest version of every motif is
              returned.
matrix_id   - Takes precedence over all other selection criteria
              except 'all'.  Only motifs with the given JASPAR
              matrix ID(s) are returned. A matrix ID may be
              specified as just a base ID or full JASPAR IDs
              including version number. If only a base ID is
              provided for specific motif(s), then just the latest
              version of those motif(s) are returned unless
              'all_versions' is also specified.
collection  - Only motifs from the specified JASPAR collection(s)
              are returned. NOTE - if not specified, the collection
              defaults to CORE for all other selection criteria
              except 'all' and 'matrix_id'. To apply the other
              selection criteria across all JASPAR collections,
              explicitly set collection=None.
tf_name     - Only motifs with the given name(s) are returned.
tf_class    - Only motifs of the given TF class(es) are returned.
tf_family   - Only motifs from the given TF families are returned.
tax_group   - Only motifs belonging to the given taxonomic
              supergroups are returned (e.g. 'vertebrates',
              'insects', 'nematodes' etc.)
species     - Only motifs derived from the given species are
              returned.  Species are specified as taxonomy IDs.
data_type   - Only motifs generated with the given data type (e.g.
              ('ChIP-seq', 'PBM', 'SELEX' etc.) are returned.
              NOTE - must match exactly as stored in the database.
pazar_id    - Only motifs with the given PAZAR TF ID are returned.
medline     - Only motifs with the given medline (PubmMed IDs) are
              returned.
min_ic      - Only motifs whose profile matrices have at least this
              information content (specificty) are returned.
min_length  - Only motifs whose profiles are of at least this
              length are returned.
min_sites   - Only motifs compiled from at least these many binding
              sites are returned.
all_versions- Unless specified, just the latest version of motifs
              determined by the other selection criteria are
              returned. Otherwise all versions of the selected
              motifs are returned.
Returns:
  • A Bio.motifs.jaspar.Record (list) of motifs.