Bio.motifs.matrix module¶
Support for various forms of sequence motif matrices.
Implementation of frequency (count) matrices, position-weight matrices, and position-specific scoring matrices.
-
class
Bio.motifs.matrix.
GenericPositionMatrix
(alphabet, values)¶ Bases:
dict
Base class for the support of position matrix operations.
-
__init__
(self, alphabet, values)¶ Initialize the class.
-
__str__
(self)¶ Return a string containing nucleotides and counts of the alphabet in the Matrix.
-
__getitem__
(self, key)¶ Return the position matrix of index key.
-
property
consensus
¶ Return the consensus sequence.
-
property
anticonsensus
¶ Return the anticonsensus sequence.
-
property
degenerate_consensus
¶ Return the degenerate consensus sequence.
-
property
gc_content
¶ Compute the fraction GC content.
-
reverse_complement
(self)¶ Compute reverse complement.
-
-
class
Bio.motifs.matrix.
FrequencyPositionMatrix
(alphabet, values)¶ Bases:
Bio.motifs.matrix.GenericPositionMatrix
Class for the support of frequency calculations on the Position Matrix.
-
normalize
(self, pseudocounts=None)¶ Create and return a position-weight matrix by normalizing the counts matrix.
If pseudocounts is None (default), no pseudocounts are added to the counts.
If pseudocounts is a number, it is added to the counts before calculating the position-weight matrix.
Alternatively, the pseudocounts can be a dictionary with a key for each letter in the alphabet associated with the motif.
-
-
class
Bio.motifs.matrix.
PositionWeightMatrix
(alphabet, counts)¶ Bases:
Bio.motifs.matrix.GenericPositionMatrix
Class for the support of weight calculations on the Position Matrix.
-
__init__
(self, alphabet, counts)¶ Initialize the class.
-
log_odds
(self, background=None)¶ Return the Position-Specific Scoring Matrix.
The Position-Specific Scoring Matrix (PSSM) contains the log-odds scores computed from the probability matrix and the background probabilities. If the background is None, a uniform background distribution is assumed.
-
-
class
Bio.motifs.matrix.
PositionSpecificScoringMatrix
(alphabet, values)¶ Bases:
Bio.motifs.matrix.GenericPositionMatrix
Class for the support of Position Specific Scoring Matrix calculations.
-
calculate
(self, sequence)¶ Return the PWM score for a given sequence for all positions.
- Notes:
the sequence can only be a DNA sequence
the search is performed only on one strand
if the sequence and the motif have the same length, a single number is returned
otherwise, the result is a one-dimensional list or numpy array
-
search
(self, sequence, threshold=0.0, both=True, chunksize=1000000)¶ Find hits with PWM score above given threshold.
A generator function, returning found hits in the given sequence with the pwm score higher than the threshold.
-
property
max
¶ Maximal possible score for this motif.
returns the score computed for the consensus sequence.
-
property
min
¶ Minimal possible score for this motif.
returns the score computed for the anticonsensus sequence.
-
property
gc_content
¶ Compute the GC-ratio.
-
mean
(self, background=None)¶ Return expected value of the score of a motif.
-
std
(self, background=None)¶ Return standard deviation of the score of a motif.
-
dist_pearson
(self, other)¶ Return the similarity score based on pearson correlation for the given motif against self.
We use the Pearson’s correlation of the respective probabilities.
-
dist_pearson_at
(self, other, offset)¶ Return the similarity score based on pearson correlation at the given offset.
-
distribution
(self, background=None, precision=1000)¶ Calculate the distribution of the scores at the given precision.
-