Bio.motifs package

Subpackages

Submodules

Module contents

Tools for sequence motif analysis.

Bio.motifs contains the core Motif class containing various I/O methods as well as methods for motif comparisons and motif searching in sequences. It also includes functionality for parsing output from the AlignACE, MEME, and MAST programs, as well as files in the TRANSFAC format.

Bio.motifs.create(instances, alphabet='ACGT')

Create a Motif object.

Bio.motifs.parse(handle, fmt, strict=True)

Parse an output file from a motif finding program.

Currently supported formats (case is ignored):
  • AlignAce: AlignAce output file format

  • ClusterBuster: Cluster Buster position frequency matrix format

  • XMS: XMS matrix format

  • MEME: MEME output file motif

  • MINIMAL: MINIMAL MEME output file motif

  • MAST: MAST output file motif

  • TRANSFAC: TRANSFAC database file format

  • pfm-four-columns: Generic position-frequency matrix format with four columns. (CIS-BP, HOMER, HOCOMOCO, Neph, Tiffin)

  • pfm-four-rows: Generic position-frequency matrix format with four row. (ScerTF, YeTFaSCo, hDPI, iDMMPMM, FlyFactorSurvey, Cys2His2 Zinc Finger Proteins PWM Predictor)

  • pfm: JASPAR-style position-frequency matrix

  • jaspar: JASPAR-style multiple PFM format

  • sites: JASPAR-style sites file

As files in the pfm and sites formats contain only a single motif, it is easier to use Bio.motifs.read() instead of Bio.motifs.parse() for those.

For example:

>>> from Bio import motifs
>>> with open("motifs/alignace.out") as handle:
...     for m in motifs.parse(handle, "AlignAce"):
...         print(m.consensus)
...
TCTACGATTGAG
CTGCACCTAGCTACGAGTGAG
GTGCCCTAAGCATACTAGGCG
GCCACTAGCAGAGCAGGGGGC
CGACTCAGAGGTT
CCACGCTAAGAGAAGTGCCGGAG
GCACGTCCCTGAGCA
GTCCATCGCAAAGCGTGGGGC
GAGATCAGAGGGCCG
TGGACGCGGGG
GACCAGAGCCTCGCATGGGGG
AGCGCGCGTG
GCCGGTTGCTGTTCATTAGG
ACCGACGGCAGCTAAAAGGG
GACGCCGGGGAT
CGACTCGCGCTTACAAGG

If strict is True (default), the parser will raise a ValueError if the file contents does not strictly comply with the specified file format.

Bio.motifs.read(handle, fmt, strict=True)

Read a motif from a handle using the specified file-format.

This supports the same formats as Bio.motifs.parse(), but only for files containing exactly one motif. For example, reading a JASPAR-style pfm file:

>>> from Bio import motifs
>>> with open("motifs/SRF.pfm") as handle:
...     m = motifs.read(handle, "pfm")
>>> m.consensus
Seq('GCCCATATATGG')

Or a single-motif MEME file,

>>> from Bio import motifs
>>> with open("motifs/meme.psp_test.classic.zoops.xml") as handle:
...     m = motifs.read(handle, "meme")
>>> m.consensus
Seq('GCTTATGTAA')

If the handle contains no records, or more than one record, an exception is raised:

>>> from Bio import motifs
>>> with open("motifs/alignace.out") as handle:
...     motif = motifs.read(handle, "AlignAce")
Traceback (most recent call last):
    ...
ValueError: More than one motif found in handle

If however you want the first motif from a file containing multiple motifs this function would raise an exception (as shown in the example above). Instead use:

>>> from Bio import motifs
>>> with open("motifs/alignace.out") as handle:
...     record = motifs.parse(handle, "alignace")
>>> motif = record[0]
>>> motif.consensus
Seq('TCTACGATTGAG')

Use the Bio.motifs.parse(handle, fmt) function if you want to read multiple records from the handle.

If strict is True (default), the parser will raise a ValueError if the file contents does not strictly comply with the specified file format.

class Bio.motifs.Instances(instances=None, alphabet='ACGT')

Bases: list

Class containing a list of sequences that made the motifs.

__init__(instances=None, alphabet='ACGT')

Initialize the class.

__str__()

Return a string containing the sequences of the motif.

count()

Count nucleotides in a position.

search(sequence)

Find positions of motifs in a given sequence.

This is a generator function, returning found positions of motif instances in a given sequence.

reverse_complement()

Compute reverse complement of sequences.

class Bio.motifs.Motif(alphabet='ACGT', alignment=None, counts=None, instances=None)

Bases: object

A class representing sequence motifs.

__init__(alphabet='ACGT', alignment=None, counts=None, instances=None)

Initialize the class.

property mask
property pseudocounts
property background
__getitem__(key)

Return a new Motif object for the positions included in key.

>>> from Bio import motifs
>>> motif = motifs.create(["AACGCCA", "ACCGCCC", "AACTCCG"])
>>> print(motif)
AACGCCA
ACCGCCC
AACTCCG
>>> print(motif[:-1])
AACGCC
ACCGCC
AACTCC
property pwm

Calculate and return the position weight matrix for this motif.

property pssm

Calculate and return the position specific scoring matrix for this motif.

property instances

Return the sequences from which the motif was built.

__str__(masked=False)

Return string representation of a motif.

__len__()

Return the length of a motif.

Please use this method (i.e. invoke len(m)) instead of referring to m.length directly.

reverse_complement()

Return the reverse complement of the motif as a new motif.

property consensus

Return the consensus sequence.

property anticonsensus

Return the least probable pattern to be generated from this motif.

property degenerate_consensus

Return the degenerate consensus sequence.

Following the rules adapted from D. R. Cavener: “Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates.” Nucleic Acids Research 15(4): 1353-1361. (1987).

The same rules are used by TRANSFAC.

property relative_entropy

Return an array with the relative entropy for each column of the motif.

Download and save a weblogo using the Berkeley weblogo service.

Requires an internet connection.

The version parameter is deprecated and has no effect.

The parameters from **kwds are passed directly to the weblogo server.

Currently, this method uses WebLogo version 3.3. These are the arguments and their default values passed to WebLogo 3.3; see their website at http://weblogo.threeplusone.com for more information:

'stack_width' : 'medium',
'stacks_per_line' : '40',
'alphabet' : 'alphabet_dna',
'ignore_lower_case' : True,
'unit_name' : "bits",
'first_index' : '1',
'logo_start' : '1',
'logo_end': str(self.length),
'composition' : "comp_auto",
'percentCG' : '',
'scale_width' : True,
'show_errorbars' : True,
'logo_title' : '',
'logo_label' : '',
'show_xaxis': True,
'xaxis_label': '',
'show_yaxis': True,
'yaxis_label': '',
'yaxis_scale': 'auto',
'yaxis_tic_interval' : '1.0',
'show_ends' : True,
'show_fineprint' : True,
'color_scheme': 'color_auto',
'symbols0': '',
'symbols1': '',
'symbols2': '',
'symbols3': '',
'symbols4': '',
'color0': '',
'color1': '',
'color2': '',
'color3': '',
'color4': '',
__format__(format_spec, **kwargs)

Return a string representation of the Motif in the given format.

Currently supported formats:
  • clusterbuster: Cluster Buster position frequency matrix format

  • pfm : JASPAR single Position Frequency Matrix

  • jaspar : JASPAR multiple Position Frequency Matrix

  • transfac : TRANSFAC like files

format(format_spec)

Return a string representation of the Motif in the given format.

Currently supported formats:
  • clusterbuster: Cluster Buster position frequency matrix format

  • pfm : JASPAR single Position Frequency Matrix

  • jaspar : JASPAR multiple Position Frequency Matrix

  • transfac : TRANSFAC like files

Bio.motifs.write(motifs, fmt, **kwargs)

Return a string representation of motifs in the given format.

Currently supported formats (case is ignored):
  • clusterbuster: Cluster Buster position frequency matrix format

  • pfm : JASPAR simple single Position Frequency Matrix

  • jaspar : JASPAR multiple PFM format

  • transfac : TRANSFAC like files