Bio.Phylo.TreeConstruction module

Classes and methods for tree construction.

class Bio.Phylo.TreeConstruction.DistanceMatrix(names, matrix=None)

Bases: Bio.Phylo.TreeConstruction._Matrix

Distance matrix class that can be used for distance based tree algorithms.

All diagonal elements will be zero no matter what the users provide.

__init__(self, names, matrix=None)

Initialize the class.

__setitem__(self, item, value)

Set Matrix’s items to values.

format_phylip(self, handle)

Write data in Phylip format to a given file-like object or handle.

The output stream is the input distance matrix format used with Phylip programs (e.g. ‘neighbor’). See: http://evolution.genetics.washington.edu/phylip/doc/neighbor.html

Parameters
handlefile or file-like object

A writeable text mode file handle or other object supporting the ‘write’ method, such as StringIO or sys.stdout.

class Bio.Phylo.TreeConstruction.DistanceCalculator(model='identity', skip_letters=None)

Bases: object

Class to calculate the distance matrix from a DNA or Protein.

Multiple Sequence Alignment(MSA) and the given name of the substitution model.

Currently only scoring matrices are used.

Parameters
modelstr

Name of the model matrix to be used to calculate distance. The attribute dna_models contains the available model names for DNA sequences and protein_models for protein sequences.

Examples

Loading a small PHYLIP alignment from which to compute distances:

from Bio.Phylo.TreeConstruction import DistanceCalculator
from Bio import AlignIO
aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip')
print(aln)

Output:

Alignment with 5 rows and 13 columns
AACGTGGCCACAT Alpha
AAGGTCGCCACAC Beta
CAGTTCGCCACAA Gamma
GAGATTTCCGCCT Delta
GAGATCTCCGCCC Epsilon

DNA calculator with ‘identity’ model:

calculator = DistanceCalculator('identity')
dm = calculator.get_distance(aln)
print(dm)

Output:

Alpha   0
Beta    0.23076923076923073     0
Gamma   0.3846153846153846      0.23076923076923073     0
Delta   0.5384615384615384      0.5384615384615384      0.5384615384615384      0
Epsilon 0.6153846153846154      0.3846153846153846      0.46153846153846156     0.15384615384615385     0
    Alpha       Beta    Gamma   Delta   Epsilon

Protein calculator with ‘blosum62’ model:

calculator = DistanceCalculator('blosum62')
dm = calculator.get_distance(aln)
print(dm)

Output:

Alpha   0
Beta    0.36904761904761907     0
Gamma   0.49397590361445787     0.25    0
Delta   0.5853658536585367      0.5476190476190477      0.5662650602409638      0
Epsilon 0.7     0.3555555555555555      0.48888888888888893     0.2222222222222222      0
    Alpha       Beta    Gamma   Delta   Epsilon
dna_models = ['benner22', 'benner6', 'benner74', 'dayhoff', 'feng', 'genetic', 'gonnet1992', 'hoxd70', 'johnson', 'jones', 'levin', 'mclachlan', 'mdm78', 'blastn', 'rao', 'risler', 'schneider', 'str', 'trans']
protein_models = ['blosum45', 'blosum50', 'blosum62', 'blosum80', 'blosum90', 'pam250', 'pam30', 'pam70']
models = ['identity', 'benner22', 'benner6', 'benner74', 'dayhoff', 'feng', 'genetic', 'gonnet1992', 'hoxd70', 'johnson', 'jones', 'levin', 'mclachlan', 'mdm78', 'blastn', 'rao', 'risler', 'schneider', 'str', 'trans', 'blosum45', 'blosum50', 'blosum62', 'blosum80', 'blosum90', 'pam250', 'pam30', 'pam70']
__init__(self, model='identity', skip_letters=None)

Initialize with a distance model.

get_distance(self, msa)

Return a DistanceMatrix for MSA object.

Parameters
msaMultipleSeqAlignment

DNA or Protein multiple sequence alignment.

class Bio.Phylo.TreeConstruction.TreeConstructor

Bases: object

Base class for all tree constructor.

build_tree(self, msa)

Caller to built the tree from a MultipleSeqAlignment object.

This should be implemented in subclass.

class Bio.Phylo.TreeConstruction.DistanceTreeConstructor(distance_calculator=None, method='nj')

Bases: Bio.Phylo.TreeConstruction.TreeConstructor

Distance based tree constructor.

Parameters
methodstr

Distance tree construction method, ‘nj’(default) or ‘upgma’.

distance_calculatorDistanceCalculator

The distance matrix calculator for multiple sequence alignment. It must be provided if build_tree will be called.

Examples

Loading a small PHYLIP alignment from which to compute distances, and then build a upgma Tree:

from Bio.Phylo.TreeConstruction import DistanceTreeConstructor
from Bio.Phylo.TreeConstruction import DistanceCalculator
from Bio import AlignIO
aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip')
constructor = DistanceTreeConstructor()
calculator = DistanceCalculator('identity')
dm = calculator.get_distance(aln)
upgmatree = constructor.upgma(dm)
print(upgmatree)

Output:

Tree(rooted=True)
    Clade(branch_length=0, name='Inner4')
        Clade(branch_length=0.18749999999999994, name='Inner1')
            Clade(branch_length=0.07692307692307693, name='Epsilon')
            Clade(branch_length=0.07692307692307693, name='Delta')
        Clade(branch_length=0.11057692307692304, name='Inner3')
            Clade(branch_length=0.038461538461538464, name='Inner2')
                Clade(branch_length=0.11538461538461536, name='Gamma')
                Clade(branch_length=0.11538461538461536, name='Beta')
            Clade(branch_length=0.15384615384615383, name='Alpha')

Build a NJ Tree:

njtree = constructor.nj(dm)
print(njtree)

Output:

Tree(rooted=False)
    Clade(branch_length=0, name='Inner3')
        Clade(branch_length=0.18269230769230765, name='Alpha')
        Clade(branch_length=0.04807692307692307, name='Beta')
        Clade(branch_length=0.04807692307692307, name='Inner2')
            Clade(branch_length=0.27884615384615385, name='Inner1')
                Clade(branch_length=0.051282051282051266, name='Epsilon')
                Clade(branch_length=0.10256410256410259, name='Delta')
            Clade(branch_length=0.14423076923076922, name='Gamma')
methods = ['nj', 'upgma']
__init__(self, distance_calculator=None, method='nj')

Initialize the class.

build_tree(self, msa)

Construct and return a Tree, Neighbor Joining or UPGMA.

upgma(self, distance_matrix)

Construct and return an UPGMA tree.

Constructs and returns an Unweighted Pair Group Method with Arithmetic mean (UPGMA) tree.

Parameters
distance_matrixDistanceMatrix

The distance matrix for tree construction.

nj(self, distance_matrix)

Construct and return a Neighbor Joining tree.

Parameters
distance_matrixDistanceMatrix

The distance matrix for tree construction.

class Bio.Phylo.TreeConstruction.Scorer

Bases: object

Base class for all tree scoring methods.

get_score(self, tree, alignment)

Caller to get the score of a tree for the given alignment.

This should be implemented in subclass.

class Bio.Phylo.TreeConstruction.TreeSearcher

Bases: object

Base class for all tree searching methods.

search(self, starting_tree, alignment)

Caller to search the best tree with a starting tree.

This should be implemented in subclass.

class Bio.Phylo.TreeConstruction.NNITreeSearcher(scorer)

Bases: Bio.Phylo.TreeConstruction.TreeSearcher

Tree searching with Nearest Neighbor Interchanges (NNI) algorithm.

Parameters
scorerParsimonyScorer

parsimony scorer to calculate the parsimony score of different trees during NNI algorithm.

__init__(self, scorer)

Initialize the class.

search(self, starting_tree, alignment)

Implement the TreeSearcher.search method.

Parameters
starting_treeTree

starting tree of NNI method.

alignmentMultipleSeqAlignment

multiple sequence alignment used to calculate parsimony score of different NNI trees.

class Bio.Phylo.TreeConstruction.ParsimonyScorer(matrix=None)

Bases: Bio.Phylo.TreeConstruction.Scorer

Parsimony scorer with a scoring matrix.

This is a combination of Fitch algorithm and Sankoff algorithm. See ParsimonyTreeConstructor for usage.

Parameters
matrix_Matrix

scoring matrix used in parsimony score calculation.

__init__(self, matrix=None)

Initialize the class.

get_score(self, tree, alignment)

Calculate parsimony score using the Fitch algorithm.

Calculate and return the parsimony score given a tree and the MSA using either the Fitch algorithm (without a penalty matrix) or the Sankoff algorithm (with a matrix).

class Bio.Phylo.TreeConstruction.ParsimonyTreeConstructor(searcher, starting_tree=None)

Bases: Bio.Phylo.TreeConstruction.TreeConstructor

Parsimony tree constructor.

Parameters
searcherTreeSearcher

tree searcher to search the best parsimony tree.

starting_treeTree

starting tree provided to the searcher.

Examples

We will load an alignment, and then load various trees which have already been computed from it:

from Bio import AlignIO, Phylo
aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip')
print(aln)

Output:

Alignment with 5 rows and 13 columns
AACGTGGCCACAT Alpha
AAGGTCGCCACAC Beta
CAGTTCGCCACAA Gamma
GAGATTTCCGCCT Delta
GAGATCTCCGCCC Epsilon

Load a starting tree:

starting_tree = Phylo.read('TreeConstruction/nj.tre', 'newick')
print(starting_tree)

Output:

Tree(rooted=False, weight=1.0)
    Clade(branch_length=0.0, name='Inner3')
        Clade(branch_length=0.01421, name='Inner2')
            Clade(branch_length=0.23927, name='Inner1')
                Clade(branch_length=0.08531, name='Epsilon')
                Clade(branch_length=0.13691, name='Delta')
            Clade(branch_length=0.2923, name='Alpha')
        Clade(branch_length=0.07477, name='Beta')
        Clade(branch_length=0.17523, name='Gamma')

Build the Parsimony tree from the starting tree:

scorer = Phylo.TreeConstruction.ParsimonyScorer()
searcher = Phylo.TreeConstruction.NNITreeSearcher(scorer)
constructor = Phylo.TreeConstruction.ParsimonyTreeConstructor(searcher, starting_tree)
pars_tree = constructor.build_tree(aln)
print(pars_tree)

Output:

Tree(rooted=True, weight=1.0)
    Clade(branch_length=0.0)
        Clade(branch_length=0.19732999999999998, name='Inner1')
            Clade(branch_length=0.13691, name='Delta')
            Clade(branch_length=0.08531, name='Epsilon')
        Clade(branch_length=0.04194000000000003, name='Inner2')
            Clade(branch_length=0.01421, name='Inner3')
                Clade(branch_length=0.17523, name='Gamma')
                Clade(branch_length=0.07477, name='Beta')
            Clade(branch_length=0.2923, name='Alpha')
__init__(self, searcher, starting_tree=None)

Initialize the class.

build_tree(self, alignment)

Build the tree.

Parameters
alignmentMultipleSeqAlignment

multiple sequence alignment to calculate parsimony tree.