Bio.Phylo.TreeConstruction module

Classes and methods for tree construction.

class Bio.Phylo.TreeConstruction.DistanceMatrix(names, matrix=None)

Bases: Bio.Phylo.TreeConstruction._Matrix

Distance matrix class that can be used for distance based tree algorithms.

All diagonal elements will be zero no matter what the users provide.

__init__(names, matrix=None)

Initialize the class.

__setitem__(item, value)

Set Matrix’s items to values.

format_phylip(handle)

Write data in Phylip format to a given file-like object or handle.

The output stream is the input distance matrix format used with Phylip programs (e.g. ‘neighbor’). See: http://evolution.genetics.washington.edu/phylip/doc/neighbor.html

Parameters
handlefile or file-like object

A writeable text mode file handle or other object supporting the ‘write’ method, such as StringIO or sys.stdout.

class Bio.Phylo.TreeConstruction.DistanceCalculator(model='identity', skip_letters=None)

Bases: object

Calculates the distance matrix from a DNA or protein sequence alignment.

This class calculates the distance matrix from a multiple sequence alignment of DNA or protein sequences, and the given name of the substitution model.

Currently only scoring matrices are used.

Parameters
modelstr

Name of the model matrix to be used to calculate distance. The attribute dna_models contains the available model names for DNA sequences and protein_models for protein sequences.

Examples

Loading a small PHYLIP alignment from which to compute distances:

>>> from Bio.Phylo.TreeConstruction import DistanceCalculator
>>> from Bio import AlignIO
>>> aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip')
>>> print(aln)  
Alignment with 5 rows and 13 columns
AACGTGGCCACAT Alpha
AAGGTCGCCACAC Beta
CAGTTCGCCACAA Gamma
GAGATTTCCGCCT Delta
GAGATCTCCGCCC Epsilon

DNA calculator with ‘identity’ model:

>>> calculator = DistanceCalculator('identity')
>>> dm = calculator.get_distance(aln)
>>> print(dm)  
  Alpha   0.000000
  Beta    0.230769    0.000000
  Gamma   0.384615    0.230769    0.000000
  Delta   0.538462    0.538462    0.538462    0.000000
  Epsilon 0.615385    0.384615    0.461538    0.153846    0.000000
      Alpha   Beta    Gamma   Delta   Epsilon

Protein calculator with ‘blosum62’ model:

>>> calculator = DistanceCalculator('blosum62')
>>> dm = calculator.get_distance(aln)
>>> print(dm)  
Alpha   0.000000
Beta    0.369048    0.000000
Gamma   0.493976    0.250000    0.000000
Delta   0.585366    0.547619    0.566265    0.000000
Epsilon 0.700000    0.355556    0.488889    0.222222    0.000000
    Alpha   Beta    Gamma   Delta   Epsilon

Same calculation, using the new Alignment object:

>>> from Bio.Phylo.TreeConstruction import DistanceCalculator
>>> from Bio import Align
>>> aln = Align.read('TreeConstruction/msa.phy', 'phylip')
>>> print(aln)  
Alpha             0 AACGTGGCCACAT 13
Beta              0 AAGGTCGCCACAC 13
Gamma             0 CAGTTCGCCACAA 13
Delta             0 GAGATTTCCGCCT 13
Epsilon           0 GAGATCTCCGCCC 13

DNA calculator with ‘identity’ model:

>>> calculator = DistanceCalculator('identity')
>>> dm = calculator.get_distance(aln)
>>> print(dm)  
Alpha   0.000000
Beta    0.230769    0.000000
Gamma   0.384615    0.230769    0.000000
Delta   0.538462    0.538462    0.538462    0.000000
Epsilon 0.615385    0.384615    0.461538    0.153846    0.000000
    Alpha   Beta    Gamma   Delta   Epsilon

Protein calculator with ‘blosum62’ model:

>>> calculator = DistanceCalculator('blosum62')
>>> dm = calculator.get_distance(aln)
>>> print(dm)  
Alpha   0.000000
Beta    0.369048    0.000000
Gamma   0.493976    0.250000    0.000000
Delta   0.585366    0.547619    0.566265    0.000000
Epsilon 0.700000    0.355556    0.488889    0.222222    0.000000
    Alpha   Beta    Gamma   Delta   Epsilon
dna_models = ['benner22', 'benner6', 'benner74', 'blastn', 'dayhoff', 'feng', 'genetic', 'gonnet1992', 'hoxd70', 'johnson', 'jones', 'levin', 'mclachlan', 'mdm78', 'megablast', 'blastn', 'rao', 'risler', 'schneider', 'str', 'trans']
protein_models = ['blastp', 'blosum45', 'blosum50', 'blosum62', 'blosum80', 'blosum90', 'pam250', 'pam30', 'pam70']
models = ['identity', 'benner22', 'benner6', 'benner74', 'blastn', 'dayhoff', 'feng', 'genetic', 'gonnet1992', 'hoxd70', 'johnson', 'jones', 'levin', 'mclachlan', 'mdm78', 'megablast', 'blastn', 'rao', 'risler', 'schneider', 'str', 'trans', 'blastp', 'blosum45', 'blosum50', 'blosum62', 'blosum80', 'blosum90', 'pam250', 'pam30', 'pam70']
__init__(model='identity', skip_letters=None)

Initialize with a distance model.

get_distance(msa)

Return a DistanceMatrix for an Alignment or MultipleSeqAlignment object.

Parameters
msaAlignment or MultipleSeqAlignment object representing a

DNA or protein multiple sequence alignment.

class Bio.Phylo.TreeConstruction.TreeConstructor

Bases: object

Base class for all tree constructor.

build_tree(msa)

Caller to build the tree from an Alignment or MultipleSeqAlignment object.

This should be implemented in subclass.

class Bio.Phylo.TreeConstruction.DistanceTreeConstructor(distance_calculator=None, method='nj')

Bases: Bio.Phylo.TreeConstruction.TreeConstructor

Distance based tree constructor.

Parameters
methodstr

Distance tree construction method, ‘nj’(default) or ‘upgma’.

distance_calculatorDistanceCalculator

The distance matrix calculator for multiple sequence alignment. It must be provided if build_tree will be called.

Examples

Loading a small PHYLIP alignment from which to compute distances, and then build a upgma Tree:

>>> from Bio.Phylo.TreeConstruction import DistanceTreeConstructor
>>> from Bio.Phylo.TreeConstruction import DistanceCalculator
>>> from Bio import AlignIO
>>> aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip')
>>> constructor = DistanceTreeConstructor()
>>> calculator = DistanceCalculator('identity')
>>> dm = calculator.get_distance(aln)
>>> upgmatree = constructor.upgma(dm)
>>> print(upgmatree)
Tree(rooted=True)
    Clade(branch_length=0, name='Inner4')
        Clade(branch_length=0.18749999999999994, name='Inner1')
            Clade(branch_length=0.07692307692307693, name='Epsilon')
            Clade(branch_length=0.07692307692307693, name='Delta')
        Clade(branch_length=0.11057692307692304, name='Inner3')
            Clade(branch_length=0.038461538461538464, name='Inner2')
                Clade(branch_length=0.11538461538461536, name='Gamma')
                Clade(branch_length=0.11538461538461536, name='Beta')
            Clade(branch_length=0.15384615384615383, name='Alpha')

Build a NJ Tree:

>>> njtree = constructor.nj(dm)
>>> print(njtree)
Tree(rooted=False)
    Clade(branch_length=0, name='Inner3')
        Clade(branch_length=0.18269230769230765, name='Alpha')
        Clade(branch_length=0.04807692307692307, name='Beta')
        Clade(branch_length=0.04807692307692307, name='Inner2')
            Clade(branch_length=0.27884615384615385, name='Inner1')
                Clade(branch_length=0.051282051282051266, name='Epsilon')
                Clade(branch_length=0.10256410256410259, name='Delta')
            Clade(branch_length=0.14423076923076922, name='Gamma')

Same example, using the new Alignment class:

>>> from Bio.Phylo.TreeConstruction import DistanceTreeConstructor
>>> from Bio.Phylo.TreeConstruction import DistanceCalculator
>>> from Bio import Align
>>> aln = Align.read(open('TreeConstruction/msa.phy'), 'phylip')
>>> constructor = DistanceTreeConstructor()
>>> calculator = DistanceCalculator('identity')
>>> dm = calculator.get_distance(aln)
>>> upgmatree = constructor.upgma(dm)
>>> print(upgmatree)
Tree(rooted=True)
    Clade(branch_length=0, name='Inner4')
        Clade(branch_length=0.18749999999999994, name='Inner1')
            Clade(branch_length=0.07692307692307693, name='Epsilon')
            Clade(branch_length=0.07692307692307693, name='Delta')
        Clade(branch_length=0.11057692307692304, name='Inner3')
            Clade(branch_length=0.038461538461538464, name='Inner2')
                Clade(branch_length=0.11538461538461536, name='Gamma')
                Clade(branch_length=0.11538461538461536, name='Beta')
            Clade(branch_length=0.15384615384615383, name='Alpha')

Build a NJ Tree:

>>> njtree = constructor.nj(dm)
>>> print(njtree)
Tree(rooted=False)
    Clade(branch_length=0, name='Inner3')
        Clade(branch_length=0.18269230769230765, name='Alpha')
        Clade(branch_length=0.04807692307692307, name='Beta')
        Clade(branch_length=0.04807692307692307, name='Inner2')
            Clade(branch_length=0.27884615384615385, name='Inner1')
                Clade(branch_length=0.051282051282051266, name='Epsilon')
                Clade(branch_length=0.10256410256410259, name='Delta')
            Clade(branch_length=0.14423076923076922, name='Gamma')
methods = ['nj', 'upgma']
__init__(distance_calculator=None, method='nj')

Initialize the class.

build_tree(msa)

Construct and return a Tree, Neighbor Joining or UPGMA.

upgma(distance_matrix)

Construct and return an UPGMA tree.

Constructs and returns an Unweighted Pair Group Method with Arithmetic mean (UPGMA) tree.

Parameters
distance_matrixDistanceMatrix

The distance matrix for tree construction.

nj(distance_matrix)

Construct and return a Neighbor Joining tree.

Parameters
distance_matrixDistanceMatrix

The distance matrix for tree construction.

class Bio.Phylo.TreeConstruction.Scorer

Bases: object

Base class for all tree scoring methods.

get_score(tree, alignment)

Caller to get the score of a tree for the given alignment.

This should be implemented in subclass.

class Bio.Phylo.TreeConstruction.TreeSearcher

Bases: object

Base class for all tree searching methods.

search(starting_tree, alignment)

Caller to search the best tree with a starting tree.

This should be implemented in subclass.

class Bio.Phylo.TreeConstruction.NNITreeSearcher(scorer)

Bases: Bio.Phylo.TreeConstruction.TreeSearcher

Tree searching with Nearest Neighbor Interchanges (NNI) algorithm.

Parameters
scorerParsimonyScorer

parsimony scorer to calculate the parsimony score of different trees during NNI algorithm.

__init__(scorer)

Initialize the class.

search(starting_tree, alignment)

Implement the TreeSearcher.search method.

Parameters
starting_treeTree

starting tree of NNI method.

alignmentAlignment or MultipleSeqAlignment object

multiple sequence alignment used to calculate parsimony score of different NNI trees.

class Bio.Phylo.TreeConstruction.ParsimonyScorer(matrix=None)

Bases: Bio.Phylo.TreeConstruction.Scorer

Parsimony scorer with a scoring matrix.

This is a combination of Fitch algorithm and Sankoff algorithm. See ParsimonyTreeConstructor for usage.

Parameters
matrix_Matrix

scoring matrix used in parsimony score calculation.

__init__(matrix=None)

Initialize the class.

get_score(tree, alignment)

Calculate parsimony score using the Fitch algorithm.

Calculate and return the parsimony score given a tree and the MSA using either the Fitch algorithm (without a penalty matrix) or the Sankoff algorithm (with a matrix).

class Bio.Phylo.TreeConstruction.ParsimonyTreeConstructor(searcher, starting_tree=None)

Bases: Bio.Phylo.TreeConstruction.TreeConstructor

Parsimony tree constructor.

Parameters
searcherTreeSearcher

tree searcher to search the best parsimony tree.

starting_treeTree

starting tree provided to the searcher.

Examples

We will load an alignment, and then load various trees which have already been computed from it:

>>> from Bio import AlignIO, Phylo
>>> aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip')
>>> print(aln)
Alignment with 5 rows and 13 columns
AACGTGGCCACAT Alpha
AAGGTCGCCACAC Beta
CAGTTCGCCACAA Gamma
GAGATTTCCGCCT Delta
GAGATCTCCGCCC Epsilon

Load a starting tree:

>>> starting_tree = Phylo.read('TreeConstruction/nj.tre', 'newick')
>>> print(starting_tree)
Tree(rooted=False, weight=1.0)
    Clade(branch_length=0.0, name='Inner3')
        Clade(branch_length=0.01421, name='Inner2')
            Clade(branch_length=0.23927, name='Inner1')
                Clade(branch_length=0.08531, name='Epsilon')
                Clade(branch_length=0.13691, name='Delta')
            Clade(branch_length=0.2923, name='Alpha')
        Clade(branch_length=0.07477, name='Beta')
        Clade(branch_length=0.17523, name='Gamma')

Build the Parsimony tree from the starting tree:

>>> scorer = Phylo.TreeConstruction.ParsimonyScorer()
>>> searcher = Phylo.TreeConstruction.NNITreeSearcher(scorer)
>>> constructor = Phylo.TreeConstruction.ParsimonyTreeConstructor(searcher, starting_tree)
>>> pars_tree = constructor.build_tree(aln)
>>> print(pars_tree)
Tree(rooted=True, weight=1.0)
    Clade(branch_length=0.0)
        Clade(branch_length=0.19732999999999998, name='Inner1')
            Clade(branch_length=0.13691, name='Delta')
            Clade(branch_length=0.08531, name='Epsilon')
        Clade(branch_length=0.04194000000000003, name='Inner2')
            Clade(branch_length=0.01421, name='Inner3')
                Clade(branch_length=0.17523, name='Gamma')
                Clade(branch_length=0.07477, name='Beta')
            Clade(branch_length=0.2923, name='Alpha')

Same example, using the new Alignment class:

>>> from Bio import Align, Phylo
>>> alignment = Align.read(open('TreeConstruction/msa.phy'), 'phylip')
>>> print(alignment)
Alpha             0 AACGTGGCCACAT 13
Beta              0 AAGGTCGCCACAC 13
Gamma             0 CAGTTCGCCACAA 13
Delta             0 GAGATTTCCGCCT 13
Epsilon           0 GAGATCTCCGCCC 13

Load a starting tree:

>>> starting_tree = Phylo.read('TreeConstruction/nj.tre', 'newick')
>>> print(starting_tree)
Tree(rooted=False, weight=1.0)
    Clade(branch_length=0.0, name='Inner3')
        Clade(branch_length=0.01421, name='Inner2')
            Clade(branch_length=0.23927, name='Inner1')
                Clade(branch_length=0.08531, name='Epsilon')
                Clade(branch_length=0.13691, name='Delta')
            Clade(branch_length=0.2923, name='Alpha')
        Clade(branch_length=0.07477, name='Beta')
        Clade(branch_length=0.17523, name='Gamma')

Build the Parsimony tree from the starting tree:

>>> scorer = Phylo.TreeConstruction.ParsimonyScorer()
>>> searcher = Phylo.TreeConstruction.NNITreeSearcher(scorer)
>>> constructor = Phylo.TreeConstruction.ParsimonyTreeConstructor(searcher, starting_tree)
>>> pars_tree = constructor.build_tree(alignment)
>>> print(pars_tree)
Tree(rooted=True, weight=1.0)
    Clade(branch_length=0.0)
        Clade(branch_length=0.19732999999999998, name='Inner1')
            Clade(branch_length=0.13691, name='Delta')
            Clade(branch_length=0.08531, name='Epsilon')
        Clade(branch_length=0.04194000000000003, name='Inner2')
            Clade(branch_length=0.01421, name='Inner3')
                Clade(branch_length=0.17523, name='Gamma')
                Clade(branch_length=0.07477, name='Beta')
            Clade(branch_length=0.2923, name='Alpha')
__init__(searcher, starting_tree=None)

Initialize the class.

build_tree(alignment)

Build the tree.

Parameters
alignmentMultipleSeqAlignment

multiple sequence alignment to calculate parsimony tree.