Package SubsMat
source code
Substitution matrices, log odds matrices, and operations on them.
General:
This module provides a class and a few routines for generating
substitution matrices, similar ot BLOSUM or PAM matrices, but based on
userprovided data.
The class used for these matrices is SeqMat
Matrices are implemented as a dictionary. Each index contains a 2tuple,
which are the two residue/nucleotide types replaced. The value differs
according to the matrix's purpose: e.g in a logodds frequency matrix, the
value would be log(Pij/(Pi*Pj)) where:
Pij: frequency of substitution of letter (residue/nucleotide) i by j
Pi, Pj: expected frequencies of i and j, respectively.
Usage:
The following section is laid out in the order by which most people wish
to generate a logodds matrix. Of course, interim matrices can be
generated and investigated. Most people just want a logodds matrix,
that's all.
Generating an Accepted Replacement Matrix:
Initially, you should generate an accepted replacement matrix (ARM)
from your data. The values in ARM are the _counted_ number of
replacements according to your data. The data could be a set of pairs
or multiple alignments. So for instance if Alanine was replaced by
Cysteine 10 times, and Cysteine by Alanine 12 times, the corresponding
ARM entries would be:
['A','C']: 10,
['C','A'] 12
As order doesn't matter, user can already provide only one entry:
['A','C']: 22
A SeqMat instance may be initialized with either a full (first
method of counting: 10, 12) or half (the latter method, 22) matrix. A
Full protein alphabet matrix would be of the size 20x20 = 400. A Half
matrix of that alphabet would be 20x20/2 + 20/2 = 210. That is because
sameletter entries don't change. (The matrix diagonal). Given an
alphabet size of N:
Full matrix size:N*N
Half matrix size: N(N+1)/2
If you provide a full matrix, the constructor will create a halfmatrix
automatically.
If you provide a halfmatrix, make sure of a (low, high) sorted order in
the keys: there should only be
a ('A','C') not a ('C','A').
Internal functions:
Generating the observed frequency matrix (OFM):
Use: OFM = _build_obs_freq_mat(ARM)
The OFM is generated from the ARM, only instead of replacement counts, it
contains replacement frequencies.
Generating an expected frequency matrix (EFM):
Use: EFM = _build_exp_freq_mat(OFM,exp_freq_table)
exp_freq_table: should be a freqTableC instantiation. See freqTable.py for
detailed information. Briefly, the expected frequency table has the
frequencies of appearance for each member of the alphabet
Generating a substitution frequency matrix (SFM):
Use: SFM = _build_subs_mat(OFM,EFM)
Accepts an OFM, EFM. Provides the division product of the corresponding
values.
Generating a logodds matrix (LOM):
Use: LOM=_build_log_odds_mat(SFM[,logbase=10,factor=10.0,roundit=1])
Accepts an SFM. logbase: base of the logarithm used to generate the
logodds values. factor: factor used to multiply the logodds values.
roundit: default  true. Whether to round the values.
Each entry is generated by log(LOM[key])*factor
And rounded if required.
External:
In most cases, users will want to generate a logodds matrix only, without
explicitly calling the OFM > EFM > SFM stages. The function
build_log_odds_matrix does that. User provides an ARM and an expected
frequency table. The function returns the logodds matrix.
Methods for subtraction, addition and multiplication of matrices:
 Generation of an expected frequency table from an observed frequency
matrix.
 Calculation of linear correlation coefficient between two matrices.
 Calculation of relative entropy is now done using the
_make_relative_entropy method and is stored in the member
self.relative_entropy
 Calculation of entropy is now done using the _make_entropy method and
is stored in the member self.entropy.
 JensenShannon distance between the distributions from which the
matrices are derived. This is a distance function based on the
distribution's entropies.

_build_obs_freq_mat(acc_rep_mat)
build_obs_freq_mat(acc_rep_mat):
Build the observed frequency matrix, from an accepted replacements matrix
The acc_rep_mat matrix should be generated by the user. 
source code



_exp_freq_table_from_obs_freq(obs_freq_mat) 
source code



_build_exp_freq_mat(exp_freq_table)
Build an expected frequency matrix
exp_freq_table: should be a FreqTable instance 
source code



_build_subs_mat(obs_freq_mat,
exp_freq_mat)
Build the substitution matrix 
source code



_build_log_odds_mat(subs_mat,
logbase=2,
factor=10.0,
round_digit=0,
keep_nd=0)
_build_log_odds_mat(subs_mat,logbase=10,factor=10.0,round_digit=1):
Build a logodds matrix
logbase=2: base of logarithm used to build (default 2)
factor=10.: a factor by which each matrix entry is multiplied
round_digit: roundoff place after decimal point
keep_nd: if true, keeps the 999 value for nondetermined values (for which there
are no substitutions in the frequency substitutions matrix). 
source code



make_log_odds_matrix(acc_rep_mat,
exp_freq_table=None,
logbase=2,
factor=1.0,
round_digit=9,
keep_nd=0) 
source code



observed_frequency_to_substitution_matrix(obs_freq_mat) 
source code





two_mat_relative_entropy(mat_1,
mat_2,
logbase=2,
diag=3) 
source code





two_mat_DJS(mat_1,
mat_2,
pi_1=0.5,
pi_2=0.5) 
source code



NOTYPE = 0


ACCREP = 1


OBSFREQ = 2


SUBS = 3


EXPFREQ = 4


LO = 5


EPSILON = 1e14


diagNO = 1


diagONLY = 2


diagALL = 3


__package__ = ' Bio.SubsMat '

_build_log_odds_mat(subs_mat,
logbase=2,
factor=10.0,
round_digit=0,
keep_nd=0)
 source code

_build_log_odds_mat(subs_mat,logbase=10,factor=10.0,round_digit=1):
Build a logodds matrix
logbase=2: base of logarithm used to build (default 2)
factor=10.: a factor by which each matrix entry is multiplied
round_digit: roundoff place after decimal point
keep_nd: if true, keeps the 999 value for nondetermined values (for which there
are no substitutions in the frequency substitutions matrix). If false, plants the
minimum logodds value of the matrix in entries containing 999
