Bio.motifs.matrix module

Support for various forms of sequence motif matrices.

Implementation of frequency (count) matrices, position-weight matrices, and position-specific scoring matrices.

class Bio.motifs.matrix.GenericPositionMatrix(alphabet, values)

Bases: dict

Base class for the support of position matrix operations.

__init__(alphabet, values)

Initialize the class.

__str__()

Return a string containing nucleotides and counts of the alphabet in the Matrix.

__getitem__(key)

Return the position matrix of index key.

property consensus

Return the consensus sequence.

property anticonsensus

Return the anticonsensus sequence.

property degenerate_consensus

Return the degenerate consensus sequence.

property gc_content

Compute the fraction GC content.

reverse_complement()

Compute reverse complement.

class Bio.motifs.matrix.FrequencyPositionMatrix(alphabet, values)

Bases: Bio.motifs.matrix.GenericPositionMatrix

Class for the support of frequency calculations on the Position Matrix.

normalize(pseudocounts=None)

Create and return a position-weight matrix by normalizing the counts matrix.

If pseudocounts is None (default), no pseudocounts are added to the counts.

If pseudocounts is a number, it is added to the counts before calculating the position-weight matrix.

Alternatively, the pseudocounts can be a dictionary with a key for each letter in the alphabet associated with the motif.

class Bio.motifs.matrix.PositionWeightMatrix(alphabet, counts)

Bases: Bio.motifs.matrix.GenericPositionMatrix

Class for the support of weight calculations on the Position Matrix.

__init__(alphabet, counts)

Initialize the class.

log_odds(background=None)

Return the Position-Specific Scoring Matrix.

The Position-Specific Scoring Matrix (PSSM) contains the log-odds scores computed from the probability matrix and the background probabilities. If the background is None, a uniform background distribution is assumed.

class Bio.motifs.matrix.PositionSpecificScoringMatrix(alphabet, values)

Bases: Bio.motifs.matrix.GenericPositionMatrix

Class for the support of Position Specific Scoring Matrix calculations.

calculate(sequence)

Return the PWM score for a given sequence for all positions.

Notes:
  • the sequence can only be a DNA sequence

  • the search is performed only on one strand

  • if the sequence and the motif have the same length, a single number is returned

  • otherwise, the result is a one-dimensional numpy array

search(sequence, threshold=0.0, both=True, chunksize=10 ** 6)

Find hits with PWM score above given threshold.

A generator function, returning found hits in the given sequence with the pwm score higher than the threshold.

property max

Maximal possible score for this motif.

returns the score computed for the consensus sequence.

property min

Minimal possible score for this motif.

returns the score computed for the anticonsensus sequence.

property gc_content

Compute the GC-ratio.

mean(background=None)

Return expected value of the score of a motif.

std(background=None)

Return standard deviation of the score of a motif.

dist_pearson(other)

Return the similarity score based on pearson correlation for the given motif against self.

We use the Pearson’s correlation of the respective probabilities.

dist_pearson_at(other, offset)

Return the similarity score based on pearson correlation at the given offset.

distribution(background=None, precision=10 ** 3)

Calculate the distribution of the scores at the given precision.