Bio.motifs package¶

Subpackages¶

Submodules¶

Module contents¶

Tools for sequence motif analysis.

Bio.motifs contains the core Motif class containing various I/O methods as well as methods for motif comparisons and motif searching in sequences. It also includes functionality for parsing output from the AlignACE, MEME, and MAST programs, as well as files in the TRANSFAC format.

Bio.motifs.create(instances, alphabet='ACGT')¶: Create a Motif object.

Bio.motifs.parse(handle, fmt, strict=True)¶

Parse an output file from a motif finding program.

Currently supported formats (case is ignored):

AlignAce: AlignAce output file format
ClusterBuster: Cluster Buster position frequency matrix format
XMS: XMS matrix format
MEME: MEME output file motif
MINIMAL: MINIMAL MEME output file motif
MAST: MAST output file motif
TRANSFAC: TRANSFAC database file format
pfm-four-columns: Generic position-frequency matrix format with four columns. (cisbp, homer, hocomoco, neph, tiffin)
pfm-four-rows: Generic position-frequency matrix format with four row. (scertf, yetfasco, hdpi, idmmpmm, flyfactor survey)
pfm: JASPAR-style position-frequency matrix
jaspar: JASPAR-style multiple PFM format
sites: JASPAR-style sites file

As files in the pfm and sites formats contain only a single motif, it is easier to use Bio.motifs.read() instead of Bio.motifs.parse() for those.

For example:

>>> from Bio import motifs
>>> with open("motifs/alignace.out") as handle:
...     for m in motifs.parse(handle, "AlignAce"):
...         print(m.consensus)
...
TCTACGATTGAG
CTGCACCTAGCTACGAGTGAG
GTGCCCTAAGCATACTAGGCG
GCCACTAGCAGAGCAGGGGGC
CGACTCAGAGGTT
CCACGCTAAGAGAAGTGCCGGAG
GCACGTCCCTGAGCA
GTCCATCGCAAAGCGTGGGGC
GAGATCAGAGGGCCG
TGGACGCGGGG
GACCAGAGCCTCGCATGGGGG
AGCGCGCGTG
GCCGGTTGCTGTTCATTAGG
ACCGACGGCAGCTAAAAGGG
GACGCCGGGGAT
CGACTCGCGCTTACAAGG

If strict is True (default), the parser will raise a ValueError if the file contents does not strictly comply with the specified file format.

Bio.motifs.read(handle, fmt, strict=True)¶

Read a motif from a handle using the specified file-format.

This supports the same formats as Bio.motifs.parse(), but only for files containing exactly one motif. For example, reading a JASPAR-style pfm file:

>>> from Bio import motifs
>>> with open("motifs/SRF.pfm") as handle:
...     m = motifs.read(handle, "pfm")
>>> m.consensus
Seq('GCCCATATATGG')

Or a single-motif MEME file,

>>> from Bio import motifs
>>> with open("motifs/meme.psp_test.classic.zoops.xml") as handle:
...     m = motifs.read(handle, "meme")
>>> m.consensus
Seq('GCTTATGTAA')

If the handle contains no records, or more than one record, an exception is raised:

>>> from Bio import motifs
>>> with open("motifs/alignace.out") as handle:
...     motif = motifs.read(handle, "AlignAce")
Traceback (most recent call last):
    ...
ValueError: More than one motif found in handle

If however you want the first motif from a file containing multiple motifs this function would raise an exception (as shown in the example above). Instead use:

>>> from Bio import motifs
>>> with open("motifs/alignace.out") as handle:
...     record = motifs.parse(handle, "alignace")
>>> motif = record[0]
>>> motif.consensus
Seq('TCTACGATTGAG')

Use the Bio.motifs.parse(handle, fmt) function if you want to read multiple records from the handle.

If strict is True (default), the parser will raise a ValueError if the file contents does not strictly comply with the specified file format.

class Bio.motifs.Instances(instances=None, alphabet='ACGT')¶

Bases: list

Class containing a list of sequences that made the motifs.

__init__(self, instances=None, alphabet='ACGT')¶: Initialize the class.

__str__(self)¶: Return a string containing the sequences of the motif.

count(self)¶: Count nucleotides in a position.

search(self, sequence)¶

Find positions of motifs in a given sequence.

This is a generator function, returning found positions of motif instances in a given sequence.

reverse_complement(self)¶: Compute reverse complement of sequences.

class Bio.motifs.Motif(alphabet='ACGT', instances=None, counts=None)¶

Bases: object

A class representing sequence motifs.

__init__(self, alphabet='ACGT', instances=None, counts=None)¶: Initialize the class.

property mask¶

property pseudocounts¶

property background¶

property pwm¶: Compute position weight matrices.

property pssm¶: Compute position specific scoring matrices.

__str__(self, masked=False)¶: Return string representation of a motif.

__len__(self)¶

Return the length of a motif.

Please use this method (i.e. invoke len(m)) instead of referring to m.length directly.

reverse_complement(self)¶: Return the reverse complement of the motif as a new motif.

property consensus¶: Return the consensus sequence.

property anticonsensus¶: Return the least probable pattern to be generated from this motif.

property degenerate_consensus¶

Return the degenerate consensus sequence.

Following the rules adapted from D. R. Cavener: “Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates.” Nucleic Acids Research 15(4): 1353-1361. (1987).

The same rules are used by TRANSFAC.

weblogo(self, fname, fmt='PNG', version='2.8.2', **kwds)¶

Download and save a weblogo using the Berkeley weblogo service.

Requires an internet connection.

The parameters from **kwds are passed directly to the weblogo server.

Currently, this method uses WebLogo version 3.3. These are the arguments and their default values passed to WebLogo 3.3; see their website at http://weblogo.threeplusone.com for more information:

'stack_width' : 'medium',
'stacks_per_line' : '40',
'alphabet' : 'alphabet_dna',
'ignore_lower_case' : True,
'unit_name' : "bits",
'first_index' : '1',
'logo_start' : '1',
'logo_end': str(self.length),
'composition' : "comp_auto",
'percentCG' : '',
'scale_width' : True,
'show_errorbars' : True,
'logo_title' : '',
'logo_label' : '',
'show_xaxis': True,
'xaxis_label': '',
'show_yaxis': True,
'yaxis_label': '',
'yaxis_scale': 'auto',
'yaxis_tic_interval' : '1.0',
'show_ends' : True,
'show_fineprint' : True,
'color_scheme': 'color_auto',
'symbols0': '',
'symbols1': '',
'symbols2': '',
'symbols3': '',
'symbols4': '',
'color0': '',
'color1': '',
'color2': '',
'color3': '',
'color4': '',

__format__(self, format_spec)¶

Return a string representation of the Motif in the given format.

Currently supported formats:

clusterbuster: Cluster Buster position frequency matrix format
pfm : JASPAR single Position Frequency Matrix
jaspar : JASPAR multiple Position Frequency Matrix
transfac : TRANSFAC like files

format(self, format_spec)¶

Return a string representation of the Motif in the given format [DEPRECATED].

This method is deprecated; instead of motif.format(format_spec), please use format(motif, format_spec).

Bio.motifs.write(motifs, fmt)¶

Return a string representation of motifs in the given format.

Currently supported formats (case is ignored):

clusterbuster: Cluster Buster position frequency matrix format
pfm : JASPAR simple single Position Frequency Matrix
jaspar : JASPAR multiple PFM format
transfac : TRANSFAC like files