Bio.HMM.MarkovModel module

Deal with representations of Markov Models.

class Bio.HMM.MarkovModel.MarkovModelBuilder(state_alphabet, emission_alphabet)

Bases: object

Interface to build up a Markov Model.

This class is designed to try to separate the task of specifying the Markov Model from the actual model itself. This is in hopes of making the actual Markov Model classes smaller.

So, this builder class should be used to create Markov models instead of trying to initiate a Markov Model directly.

DEFAULT_PSEUDO = 1
__init__(self, state_alphabet, emission_alphabet)

Initialize a builder to create Markov Models.

Arguments:
  • state_alphabet – An iterable (e.g., tuple or list) containing all of the letters that can appear in the states

  • emission_alphabet – An iterable (e.g., tuple or list) containing all of the letters for states that can be emitted by the HMM.

get_markov_model(self)

Return the markov model corresponding with the current parameters.

Each markov model returned by a call to this function is unique (ie. they don’t influence each other).

set_initial_probabilities(self, initial_prob)

Set initial state probabilities.

initial_prob is a dictionary mapping states to probabilities. Suppose, for example, that the state alphabet is (‘A’, ‘B’). Call set_initial_prob({‘A’: 1}) to guarantee that the initial state will be ‘A’. Call set_initial_prob({‘A’: 0.5, ‘B’: 0.5}) to make each initial state equally probable.

This method must now be called in order to use the Markov model because the calculation of initial probabilities has changed incompatibly; the previous calculation was incorrect.

If initial probabilities are set for all states, then they should add up to 1. Otherwise the sum should be <= 1. The residual probability is divided up evenly between all the states for which the initial probability has not been set. For example, calling set_initial_prob({}) results in P(‘A’) = 0.5 and P(‘B’) = 0.5, for the above example.

set_equal_probabilities(self)

Reset all probabilities to be an average value.

Resets the values of all initial probabilities and all allowed transitions and all allowed emissions to be equal to 1 divided by the number of possible elements.

This is useful if you just want to initialize a Markov Model to starting values (ie. if you have no prior notions of what the probabilities should be – or if you are just feeling too lazy to calculate them :-).

Warning 1 – this will reset all currently set probabilities.

Warning 2 – This just sets all probabilities for transitions and emissions to total up to 1, so it doesn’t ensure that the sum of each set of transitions adds up to 1.

set_random_initial_probabilities(self)

Set all initial state probabilities to a randomly generated distribution.

Returns the dictionary containing the initial probabilities.

set_random_transition_probabilities(self)

Set all allowed transition probabilities to a randomly generated distribution.

Returns the dictionary containing the transition probabilities.

set_random_emission_probabilities(self)

Set all allowed emission probabilities to a randomly generated distribution.

Returns the dictionary containing the emission probabilities.

set_random_probabilities(self)

Set all probabilities to randomly generated numbers.

Resets probabilities of all initial states, transitions, and emissions to random values.

allow_all_transitions(self)

Create transitions between all states.

By default all transitions within the alphabet are disallowed; this is a convenience function to change this to allow all possible transitions.

allow_transition(self, from_state, to_state, probability=None, pseudocount=None)

Set a transition as being possible between the two states.

probability and pseudocount are optional arguments specifying the probabilities and pseudo counts for the transition. If these are not supplied, then the values are set to the default values.

Raises: KeyError – if the two states already have an allowed transition.

destroy_transition(self, from_state, to_state)

Restrict transitions between the two states.

Raises: KeyError if the transition is not currently allowed.

set_transition_score(self, from_state, to_state, probability)

Set the probability of a transition between two states.

Raises: KeyError if the transition is not allowed.

set_transition_pseudocount(self, from_state, to_state, count)

Set the default pseudocount for a transition.

To avoid computational problems, it is helpful to be able to set a ‘default’ pseudocount to start with for estimating transition and emission probabilities (see p62 in Durbin et al for more discussion on this. By default, all transitions have a pseudocount of 1.

Raises: KeyError if the transition is not allowed.

set_emission_score(self, seq_state, emission_state, probability)

Set the probability of a emission from a particular state.

Raises: KeyError if the emission from the given state is not allowed.

set_emission_pseudocount(self, seq_state, emission_state, count)

Set the default pseudocount for an emission.

To avoid computational problems, it is helpful to be able to set a ‘default’ pseudocount to start with for estimating transition and emission probabilities (see p62 in Durbin et al for more discussion on this. By default, all emissions have a pseudocount of 1.

Raises: KeyError if the emission from the given state is not allowed.

class Bio.HMM.MarkovModel.HiddenMarkovModel(state_alphabet, emission_alphabet, initial_prob, transition_prob, emission_prob, transition_pseudo, emission_pseudo)

Bases: object

Represent a hidden markov model that can be used for state estimation.

__init__(self, state_alphabet, emission_alphabet, initial_prob, transition_prob, emission_prob, transition_pseudo, emission_pseudo)

Initialize a Markov Model.

Note: You should use the MarkovModelBuilder class instead of initiating this class directly.

Arguments:
  • state_alphabet – A tuple containing all of the letters that can appear in the states.

  • emission_alphabet – A tuple containing all of the letters for states that can be emitted by the HMM.

  • initial_prob - A dictionary of initial probabilities for all states.

  • transition_prob – A dictionary of transition probabilities for all possible transitions in the sequence.

  • emission_prob – A dictionary of emission probabilities for all possible emissions from the sequence states.

  • transition_pseudo – Pseudo-counts to be used for the transitions, when counting for purposes of estimating transition probabilities.

  • emission_pseudo – Pseudo-counts to be used for the emissions, when counting for purposes of estimating emission probabilities.

get_blank_transitions(self)

Get the default transitions for the model.

Returns a dictionary of all of the default transitions between any two letters in the sequence alphabet. The dictionary is structured with keys as (letter1, letter2) and values as the starting number of transitions.

get_blank_emissions(self)

Get the starting default emmissions for each sequence.

This returns a dictionary of the default emmissions for each letter. The dictionary is structured with keys as (seq_letter, emmission_letter) and values as the starting number of emmissions.

transitions_from(self, state_letter)

Get all destination states which can transition from source state_letter.

This returns all letters which the given state_letter can transition to, i.e. all the destination states reachable from state_letter.

An empty list is returned if state_letter has no outgoing transitions.

transitions_to(self, state_letter)

Get all source states which can transition to destination state_letter.

This returns all letters which the given state_letter is reachable from, i.e. all the source states which can reach state_later

An empty list is returned if state_letter is unreachable.

viterbi(self, sequence, state_alphabet)

Calculate the most probable state path using the Viterbi algorithm.

This implements the Viterbi algorithm (see pgs 55-57 in Durbin et al for a full explanation – this is where I took my implementation ideas from), to allow decoding of the state path, given a sequence of emissions.

Arguments:
  • sequence – A Seq object with the emission sequence that we want to decode.

  • state_alphabet – An iterable (e.g., tuple or list) containing all of the letters that can appear in the states