[frames] | no frames]

# Package SubsMat

source code

Substitution matrices, log odds matrices, and operations on them.

# General:

This module provides a class and a few routines for generating substitution matrices, similar ot BLOSUM or PAM matrices, but based on user-provided data. The class used for these matrices is SeqMat

Matrices are implemented as a dictionary. Each index contains a 2-tuple, which are the two residue/nucleotide types replaced. The value differs according to the matrix's purpose: e.g in a log-odds frequency matrix, the value would be log(Pij/(Pi*Pj)) where: Pij: frequency of substitution of letter (residue/nucleotide) i by j Pi, Pj: expected frequencies of i and j, respectively.

# Usage:

The following section is laid out in the order by which most people wish to generate a log-odds matrix. Of course, interim matrices can be generated and investigated. Most people just want a log-odds matrix, that's all.

# Generating an Accepted Replacement Matrix:

Initially, you should generate an accepted replacement matrix (ARM) from your data. The values in ARM are the _counted_ number of replacements according to your data. The data could be a set of pairs or multiple alignments. So for instance if Alanine was replaced by Cysteine 10 times, and Cysteine by Alanine 12 times, the corresponding ARM entries would be: ['A','C']: 10, ['C','A'] 12 As order doesn't matter, user can already provide only one entry: ['A','C']: 22 A SeqMat instance may be initialized with either a full (first method of counting: 10, 12) or half (the latter method, 22) matrix. A Full protein alphabet matrix would be of the size 20x20 = 400. A Half matrix of that alphabet would be 20x20/2 + 20/2 = 210. That is because same-letter entries don't change. (The matrix diagonal). Given an alphabet size of N: Full matrix size:N*N Half matrix size: N(N+1)/2

If you provide a full matrix, the constructor will create a half-matrix automatically. If you provide a half-matrix, make sure of a (low, high) sorted order in the keys: there should only be a ('A','C') not a ('C','A').

Internal functions:

# Generating the observed frequency matrix (OFM):

Use: OFM = _build_obs_freq_mat(ARM) The OFM is generated from the ARM, only instead of replacement counts, it contains replacement frequencies.

# Generating an expected frequency matrix (EFM):

Use: EFM = _build_exp_freq_mat(OFM,exp_freq_table) exp_freq_table: should be a freqTableC instantiation. See freqTable.py for detailed information. Briefly, the expected frequency table has the frequencies of appearance for each member of the alphabet

# Generating a substitution frequency matrix (SFM):

Use: SFM = _build_subs_mat(OFM,EFM) Accepts an OFM, EFM. Provides the division product of the corresponding values.

# Generating a log-odds matrix (LOM):

Use: LOM=_build_log_odds_mat(SFM[,logbase=10,factor=10.0,roundit=1]) Accepts an SFM. logbase: base of the logarithm used to generate the log-odds values. factor: factor used to multiply the log-odds values. roundit: default - true. Whether to round the values. Each entry is generated by log(LOM[key])*factor And rounded if required.

# External:

In most cases, users will want to generate a log-odds matrix only, without explicitly calling the OFM --> EFM --> SFM stages. The function build_log_odds_matrix does that. User provides an ARM and an expected frequency table. The function returns the log-odds matrix.

# Methods for subtraction, addition and multiplication of matrices:

• Generation of an expected frequency table from an observed frequency matrix.
• Calculation of linear correlation coefficient between two matrices.
• Calculation of relative entropy is now done using the _make_relative_entropy method and is stored in the member self.relative_entropy
• Calculation of entropy is now done using the _make_entropy method and is stored in the member self.entropy.
• Jensen-Shannon distance between the distributions from which the matrices are derived. This is a distance function based on the distribution's entropies.
 Submodules

 Classes
SeqMat
A Generic sequence matrix class.
AcceptedReplacementsMatrix
Accepted replacements matrix.
ObservedFrequencyMatrix
Observed frequency matrix.
ExpectedFrequencyMatrix
Expected frequency matrix.
SubstitutionMatrix
Substitution matrix.
LogOddsMatrix
Log odds matrix.
 Functions

 _build_obs_freq_mat(acc_rep_mat) Build observed frequency matrix (PRIVATE). source code

 _exp_freq_table_from_obs_freq(obs_freq_mat) Build expected frequence table from observed frequences (PRIVATE). source code

 _build_exp_freq_mat(exp_freq_table) Build an expected frequency matrix (PRIVATE). source code

 _build_subs_mat(obs_freq_mat, exp_freq_mat) Build the substitution matrix (PRIVATE). source code

 _build_log_odds_mat(subs_mat, logbase=2, factor=10.0, round_digit=0, keep_nd=0) Build a log-odds matrix (PRIVATE). source code

 make_log_odds_matrix(acc_rep_mat, exp_freq_table=None, logbase=2, factor=1.0, round_digit=9, keep_nd=0) Make log-odds matrix. source code

 observed_frequency_to_substitution_matrix(obs_freq_mat) Convert observed frequency table into substitution matrix. source code

 two_mat_relative_entropy(mat_1, mat_2, logbase=2, diag=3) Return relative entropy of two matrices. source code

 two_mat_correlation(mat_1, mat_2) Return linear correlation coefficient between two matrices. source code

 two_mat_DJS(mat_1, mat_2, pi_1=0.5, pi_2=0.5) Return Jensen-Shannon Distance between two observed frequence matrices. source code
 Variables
NOTYPE = `0`
ACCREP = `1`
OBSFREQ = `2`
SUBS = `3`
hash(x)
EXPFREQ = `4`
LO = `5`
EPSILON = `1e-14`
diagNO = `1`
diagONLY = `2`
diagALL = `3`
hash(x)
__package__ = `'Bio.SubsMat'`
 Function Details

### _build_obs_freq_mat(acc_rep_mat)

source code

Build observed frequency matrix (PRIVATE).

Build the observed frequency matrix. from an accepted replacements matrix. The acc_rep_mat matrix should be generated by the user.

### _build_exp_freq_mat(exp_freq_table)

source code

Build an expected frequency matrix (PRIVATE).

exp_freq_table: should be a FreqTable instance

### _build_log_odds_mat(subs_mat, logbase=2, factor=10.0, round_digit=0, keep_nd=0)

source code

Build a log-odds matrix (PRIVATE).

• logbase=2: base of logarithm used to build (default 2)
• factor=10.: a factor by which each matrix entry is multiplied
• round_digit: roundoff place after decimal point
• keep_nd: if true, keeps the -999 value for non-determined values (for which there are no substitutions in the frequency substitutions matrix). If false, plants the minimum log-odds value of the matrix in entries containing -999.

 Generated by Epydoc 3.0.1 on Fri Jun 22 16:34:27 2018 http://epydoc.sourceforge.net