Bio.motifs.matrix module¶
Support for various forms of sequence motif matrices.
Implementation of frequency (count) matrices, position-weight matrices, and position-specific scoring matrices.
- class Bio.motifs.matrix.GenericPositionMatrix(alphabet, values)¶
Bases:
dict
Base class for the support of position matrix operations.
- __init__(self, alphabet, values)¶
Initialize the class.
- __str__(self)¶
Return a string containing nucleotides and counts of the alphabet in the Matrix.
- __getitem__(self, key)¶
Return the position matrix of index key.
- property consensus¶
Return the consensus sequence.
- property anticonsensus¶
Return the anticonsensus sequence.
- property degenerate_consensus¶
Return the degenerate consensus sequence.
- property gc_content¶
Compute the fraction GC content.
- reverse_complement(self)¶
Compute reverse complement.
- class Bio.motifs.matrix.FrequencyPositionMatrix(alphabet, values)¶
Bases:
Bio.motifs.matrix.GenericPositionMatrix
Class for the support of frequency calculations on the Position Matrix.
- normalize(self, pseudocounts=None)¶
Create and return a position-weight matrix by normalizing the counts matrix.
If pseudocounts is None (default), no pseudocounts are added to the counts.
If pseudocounts is a number, it is added to the counts before calculating the position-weight matrix.
Alternatively, the pseudocounts can be a dictionary with a key for each letter in the alphabet associated with the motif.
- class Bio.motifs.matrix.PositionWeightMatrix(alphabet, counts)¶
Bases:
Bio.motifs.matrix.GenericPositionMatrix
Class for the support of weight calculations on the Position Matrix.
- __init__(self, alphabet, counts)¶
Initialize the class.
- log_odds(self, background=None)¶
Return the Position-Specific Scoring Matrix.
The Position-Specific Scoring Matrix (PSSM) contains the log-odds scores computed from the probability matrix and the background probabilities. If the background is None, a uniform background distribution is assumed.
- class Bio.motifs.matrix.PositionSpecificScoringMatrix(alphabet, values)¶
Bases:
Bio.motifs.matrix.GenericPositionMatrix
Class for the support of Position Specific Scoring Matrix calculations.
- calculate(self, sequence)¶
Return the PWM score for a given sequence for all positions.
- Notes:
the sequence can only be a DNA sequence
the search is performed only on one strand
if the sequence and the motif have the same length, a single number is returned
otherwise, the result is a one-dimensional numpy array
- search(self, sequence, threshold=0.0, both=True, chunksize=10 ** 6)¶
Find hits with PWM score above given threshold.
A generator function, returning found hits in the given sequence with the pwm score higher than the threshold.
- property max¶
Maximal possible score for this motif.
returns the score computed for the consensus sequence.
- property min¶
Minimal possible score for this motif.
returns the score computed for the anticonsensus sequence.
- property gc_content¶
Compute the GC-ratio.
- mean(self, background=None)¶
Return expected value of the score of a motif.
- std(self, background=None)¶
Return standard deviation of the score of a motif.
- dist_pearson(self, other)¶
Return the similarity score based on pearson correlation for the given motif against self.
We use the Pearson’s correlation of the respective probabilities.
- dist_pearson_at(self, other, offset)¶
Return the similarity score based on pearson correlation at the given offset.
- distribution(self, background=None, precision=10 ** 3)¶
Calculate the distribution of the scores at the given precision.