Bio.motifs.matrix module
Support for various forms of sequence motif matrices.
Implementation of frequency (count) matrices, position-weight matrices, and position-specific scoring matrices.
- class Bio.motifs.matrix.GenericPositionMatrix(alphabet, values)
Bases:
dict
Base class for the support of position matrix operations.
- __init__(alphabet, values)
Initialize the class.
- __str__()
Return a string containing nucleotides and counts of the alphabet in the Matrix.
- __getitem__(key)
Return the position matrix of index key.
- property consensus
Return the consensus sequence.
- property anticonsensus
Return the anticonsensus sequence.
- property degenerate_consensus
Return the degenerate consensus sequence.
- property gc_content
Compute the fraction GC content.
- reverse_complement()
Compute reverse complement.
- class Bio.motifs.matrix.FrequencyPositionMatrix(alphabet, values)
Bases:
Bio.motifs.matrix.GenericPositionMatrix
Class for the support of frequency calculations on the Position Matrix.
- normalize(pseudocounts=None)
Create and return a position-weight matrix by normalizing the counts matrix.
If pseudocounts is None (default), no pseudocounts are added to the counts.
If pseudocounts is a number, it is added to the counts before calculating the position-weight matrix.
Alternatively, the pseudocounts can be a dictionary with a key for each letter in the alphabet associated with the motif.
- class Bio.motifs.matrix.PositionWeightMatrix(alphabet, counts)
Bases:
Bio.motifs.matrix.GenericPositionMatrix
Class for the support of weight calculations on the Position Matrix.
- __init__(alphabet, counts)
Initialize the class.
- log_odds(background=None)
Return the Position-Specific Scoring Matrix.
The Position-Specific Scoring Matrix (PSSM) contains the log-odds scores computed from the probability matrix and the background probabilities. If the background is None, a uniform background distribution is assumed.
- class Bio.motifs.matrix.PositionSpecificScoringMatrix(alphabet, values)
Bases:
Bio.motifs.matrix.GenericPositionMatrix
Class for the support of Position Specific Scoring Matrix calculations.
- calculate(sequence)
Return the PWM score for a given sequence for all positions.
- Notes:
the sequence can only be a DNA sequence
the search is performed only on one strand
if the sequence and the motif have the same length, a single number is returned
otherwise, the result is a one-dimensional numpy array
- search(sequence, threshold=0.0, both=True, chunksize=10 ** 6)
Find hits with PWM score above given threshold.
A generator function, returning found hits in the given sequence with the pwm score higher than the threshold.
- property max
Maximal possible score for this motif.
returns the score computed for the consensus sequence.
- property min
Minimal possible score for this motif.
returns the score computed for the anticonsensus sequence.
- property gc_content
Compute the GC-ratio.
- mean(background=None)
Return expected value of the score of a motif.
- std(background=None)
Return standard deviation of the score of a motif.
- dist_pearson(other)
Return the similarity score based on pearson correlation for the given motif against self.
We use the Pearson’s correlation of the respective probabilities.
- dist_pearson_at(other, offset)
Return the similarity score based on pearson correlation at the given offset.
- distribution(background=None, precision=10 ** 3)
Calculate the distribution of the scores at the given precision.