Bio.MarkovModel module

A state-emitting MarkovModel.

Note terminology similar to Manning and Schutze is used.

Functions: train_bw Train a markov model using the Baum-Welch algorithm. train_visible Train a visible markov model using MLE. find_states Find the a state sequence that explains some observations.

load Load a MarkovModel. save Save a MarkovModel.

Classes: MarkovModel Holds the description of a markov model

Bio.MarkovModel.itemindex(values): Return a dictionary of values with their sequence offset as keys.

class Bio.MarkovModel.MarkovModel(states, alphabet, p_initial=None, p_transition=None, p_emission=None)

Bases: object

Create a state-emitting MarkovModel object.

__init__(states, alphabet, p_initial=None, p_transition=None, p_emission=None): Initialize the class.

__str__(): Create a string representation of the MarkovModel object.

Bio.MarkovModel.load(handle): Parse a file handle into a MarkovModel object.

Bio.MarkovModel.save(mm, handle): Save MarkovModel object into handle.

Bio.MarkovModel.train_bw(states, alphabet, training_data, pseudo_initial=None, pseudo_transition=None, pseudo_emission=None, update_fn=None)

Train a MarkovModel using the Baum-Welch algorithm.

Train a MarkovModel using the Baum-Welch algorithm. states is a list of strings that describe the names of each state. alphabet is a list of objects that indicate the allowed outputs. training_data is a list of observations. Each observation is a list of objects from the alphabet.

pseudo_initial, pseudo_transition, and pseudo_emission are optional parameters that you can use to assign pseudo-counts to different matrices. They should be matrices of the appropriate size that contain numbers to add to each parameter matrix, before normalization.

update_fn is an optional callback that takes parameters (iteration, log_likelihood). It is called once per iteration.

Bio.MarkovModel.train_visible(states, alphabet, training_data, pseudo_initial=None, pseudo_transition=None, pseudo_emission=None)

Train a visible MarkovModel using maximum likelihoood estimates for each of the parameters.

Train a visible MarkovModel using maximum likelihoood estimates for each of the parameters. states is a list of strings that describe the names of each state. alphabet is a list of objects that indicate the allowed outputs. training_data is a list of (outputs, observed states) where outputs is a list of the emission from the alphabet, and observed states is a list of states from states.

pseudo_initial, pseudo_transition, and pseudo_emission are optional parameters that you can use to assign pseudo-counts to different matrices. They should be matrices of the appropriate size that contain numbers to add to each parameter matrix.

Bio.MarkovModel.find_states(markov_model, output)

Find states in the given Markov model output.

Returns a list of (states, score) tuples.