Bio.Phylo.TreeConstruction module¶
Classes and methods for tree construction.
- class Bio.Phylo.TreeConstruction.DistanceMatrix(names, matrix=None)¶
Bases:
Bio.Phylo.TreeConstruction._Matrix
Distance matrix class that can be used for distance based tree algorithms.
All diagonal elements will be zero no matter what the users provide.
- __init__(self, names, matrix=None)¶
Initialize the class.
- __setitem__(self, item, value)¶
Set Matrix’s items to values.
- format_phylip(self, handle)¶
Write data in Phylip format to a given file-like object or handle.
The output stream is the input distance matrix format used with Phylip programs (e.g. ‘neighbor’). See: http://evolution.genetics.washington.edu/phylip/doc/neighbor.html
- Parameters
- handlefile or file-like object
A writeable text mode file handle or other object supporting the ‘write’ method, such as StringIO or sys.stdout.
- class Bio.Phylo.TreeConstruction.DistanceCalculator(model='identity', skip_letters=None)¶
Bases:
object
Class to calculate the distance matrix from a DNA or Protein.
Multiple Sequence Alignment(MSA) and the given name of the substitution model.
Currently only scoring matrices are used.
- Parameters
- modelstr
Name of the model matrix to be used to calculate distance. The attribute
dna_models
contains the available model names for DNA sequences andprotein_models
for protein sequences.
Examples
Loading a small PHYLIP alignment from which to compute distances:
from Bio.Phylo.TreeConstruction import DistanceCalculator from Bio import AlignIO aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip') print(aln)
Output:
Alignment with 5 rows and 13 columns AACGTGGCCACAT Alpha AAGGTCGCCACAC Beta CAGTTCGCCACAA Gamma GAGATTTCCGCCT Delta GAGATCTCCGCCC Epsilon
DNA calculator with ‘identity’ model:
calculator = DistanceCalculator('identity') dm = calculator.get_distance(aln) print(dm)
Output:
Alpha 0 Beta 0.23076923076923073 0 Gamma 0.3846153846153846 0.23076923076923073 0 Delta 0.5384615384615384 0.5384615384615384 0.5384615384615384 0 Epsilon 0.6153846153846154 0.3846153846153846 0.46153846153846156 0.15384615384615385 0 Alpha Beta Gamma Delta Epsilon
Protein calculator with ‘blosum62’ model:
calculator = DistanceCalculator('blosum62') dm = calculator.get_distance(aln) print(dm)
Output:
Alpha 0 Beta 0.36904761904761907 0 Gamma 0.49397590361445787 0.25 0 Delta 0.5853658536585367 0.5476190476190477 0.5662650602409638 0 Epsilon 0.7 0.3555555555555555 0.48888888888888893 0.2222222222222222 0 Alpha Beta Gamma Delta Epsilon
- dna_models = ['benner22', 'benner6', 'benner74', 'dayhoff', 'feng', 'genetic', 'gonnet1992', 'hoxd70', 'johnson', 'jones', 'levin', 'mclachlan', 'mdm78', 'blastn', 'rao', 'risler', 'schneider', 'str', 'trans']¶
- protein_models = ['blosum45', 'blosum50', 'blosum62', 'blosum80', 'blosum90', 'pam250', 'pam30', 'pam70']¶
- models = ['identity', 'benner22', 'benner6', 'benner74', 'dayhoff', 'feng', 'genetic', 'gonnet1992', 'hoxd70', 'johnson', 'jones', 'levin', 'mclachlan', 'mdm78', 'blastn', 'rao', 'risler', 'schneider', 'str', 'trans', 'blosum45', 'blosum50', 'blosum62', 'blosum80', 'blosum90', 'pam250', 'pam30', 'pam70']¶
- __init__(self, model='identity', skip_letters=None)¶
Initialize with a distance model.
- get_distance(self, msa)¶
Return a DistanceMatrix for MSA object.
- Parameters
- msaMultipleSeqAlignment
DNA or Protein multiple sequence alignment.
- class Bio.Phylo.TreeConstruction.TreeConstructor¶
Bases:
object
Base class for all tree constructor.
- build_tree(self, msa)¶
Caller to built the tree from a MultipleSeqAlignment object.
This should be implemented in subclass.
- class Bio.Phylo.TreeConstruction.DistanceTreeConstructor(distance_calculator=None, method='nj')¶
Bases:
Bio.Phylo.TreeConstruction.TreeConstructor
Distance based tree constructor.
- Parameters
- methodstr
Distance tree construction method, ‘nj’(default) or ‘upgma’.
- distance_calculatorDistanceCalculator
The distance matrix calculator for multiple sequence alignment. It must be provided if
build_tree
will be called.
Examples
Loading a small PHYLIP alignment from which to compute distances, and then build a upgma Tree:
from Bio.Phylo.TreeConstruction import DistanceTreeConstructor from Bio.Phylo.TreeConstruction import DistanceCalculator from Bio import AlignIO aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip') constructor = DistanceTreeConstructor() calculator = DistanceCalculator('identity') dm = calculator.get_distance(aln) upgmatree = constructor.upgma(dm) print(upgmatree)
Output:
Tree(rooted=True) Clade(branch_length=0, name='Inner4') Clade(branch_length=0.18749999999999994, name='Inner1') Clade(branch_length=0.07692307692307693, name='Epsilon') Clade(branch_length=0.07692307692307693, name='Delta') Clade(branch_length=0.11057692307692304, name='Inner3') Clade(branch_length=0.038461538461538464, name='Inner2') Clade(branch_length=0.11538461538461536, name='Gamma') Clade(branch_length=0.11538461538461536, name='Beta') Clade(branch_length=0.15384615384615383, name='Alpha')
Build a NJ Tree:
njtree = constructor.nj(dm) print(njtree)
Output:
Tree(rooted=False) Clade(branch_length=0, name='Inner3') Clade(branch_length=0.18269230769230765, name='Alpha') Clade(branch_length=0.04807692307692307, name='Beta') Clade(branch_length=0.04807692307692307, name='Inner2') Clade(branch_length=0.27884615384615385, name='Inner1') Clade(branch_length=0.051282051282051266, name='Epsilon') Clade(branch_length=0.10256410256410259, name='Delta') Clade(branch_length=0.14423076923076922, name='Gamma')
- methods = ['nj', 'upgma']¶
- __init__(self, distance_calculator=None, method='nj')¶
Initialize the class.
- build_tree(self, msa)¶
Construct and return a Tree, Neighbor Joining or UPGMA.
- upgma(self, distance_matrix)¶
Construct and return an UPGMA tree.
Constructs and returns an Unweighted Pair Group Method with Arithmetic mean (UPGMA) tree.
- Parameters
- distance_matrixDistanceMatrix
The distance matrix for tree construction.
- nj(self, distance_matrix)¶
Construct and return a Neighbor Joining tree.
- Parameters
- distance_matrixDistanceMatrix
The distance matrix for tree construction.
- class Bio.Phylo.TreeConstruction.Scorer¶
Bases:
object
Base class for all tree scoring methods.
- get_score(self, tree, alignment)¶
Caller to get the score of a tree for the given alignment.
This should be implemented in subclass.
- class Bio.Phylo.TreeConstruction.TreeSearcher¶
Bases:
object
Base class for all tree searching methods.
- search(self, starting_tree, alignment)¶
Caller to search the best tree with a starting tree.
This should be implemented in subclass.
- class Bio.Phylo.TreeConstruction.NNITreeSearcher(scorer)¶
Bases:
Bio.Phylo.TreeConstruction.TreeSearcher
Tree searching with Nearest Neighbor Interchanges (NNI) algorithm.
- Parameters
- scorerParsimonyScorer
parsimony scorer to calculate the parsimony score of different trees during NNI algorithm.
- __init__(self, scorer)¶
Initialize the class.
- search(self, starting_tree, alignment)¶
Implement the TreeSearcher.search method.
- Parameters
- starting_treeTree
starting tree of NNI method.
- alignmentMultipleSeqAlignment
multiple sequence alignment used to calculate parsimony score of different NNI trees.
- class Bio.Phylo.TreeConstruction.ParsimonyScorer(matrix=None)¶
Bases:
Bio.Phylo.TreeConstruction.Scorer
Parsimony scorer with a scoring matrix.
This is a combination of Fitch algorithm and Sankoff algorithm. See ParsimonyTreeConstructor for usage.
- Parameters
- matrix_Matrix
scoring matrix used in parsimony score calculation.
- __init__(self, matrix=None)¶
Initialize the class.
- get_score(self, tree, alignment)¶
Calculate parsimony score using the Fitch algorithm.
Calculate and return the parsimony score given a tree and the MSA using either the Fitch algorithm (without a penalty matrix) or the Sankoff algorithm (with a matrix).
- class Bio.Phylo.TreeConstruction.ParsimonyTreeConstructor(searcher, starting_tree=None)¶
Bases:
Bio.Phylo.TreeConstruction.TreeConstructor
Parsimony tree constructor.
- Parameters
- searcherTreeSearcher
tree searcher to search the best parsimony tree.
- starting_treeTree
starting tree provided to the searcher.
Examples
We will load an alignment, and then load various trees which have already been computed from it:
from Bio import AlignIO, Phylo aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip') print(aln)
Output:
Alignment with 5 rows and 13 columns AACGTGGCCACAT Alpha AAGGTCGCCACAC Beta CAGTTCGCCACAA Gamma GAGATTTCCGCCT Delta GAGATCTCCGCCC Epsilon
Load a starting tree:
starting_tree = Phylo.read('TreeConstruction/nj.tre', 'newick') print(starting_tree)
Output:
Tree(rooted=False, weight=1.0) Clade(branch_length=0.0, name='Inner3') Clade(branch_length=0.01421, name='Inner2') Clade(branch_length=0.23927, name='Inner1') Clade(branch_length=0.08531, name='Epsilon') Clade(branch_length=0.13691, name='Delta') Clade(branch_length=0.2923, name='Alpha') Clade(branch_length=0.07477, name='Beta') Clade(branch_length=0.17523, name='Gamma')
Build the Parsimony tree from the starting tree:
scorer = Phylo.TreeConstruction.ParsimonyScorer() searcher = Phylo.TreeConstruction.NNITreeSearcher(scorer) constructor = Phylo.TreeConstruction.ParsimonyTreeConstructor(searcher, starting_tree) pars_tree = constructor.build_tree(aln) print(pars_tree)
Output:
Tree(rooted=True, weight=1.0) Clade(branch_length=0.0) Clade(branch_length=0.19732999999999998, name='Inner1') Clade(branch_length=0.13691, name='Delta') Clade(branch_length=0.08531, name='Epsilon') Clade(branch_length=0.04194000000000003, name='Inner2') Clade(branch_length=0.01421, name='Inner3') Clade(branch_length=0.17523, name='Gamma') Clade(branch_length=0.07477, name='Beta') Clade(branch_length=0.2923, name='Alpha')
- __init__(self, searcher, starting_tree=None)¶
Initialize the class.
- build_tree(self, alignment)¶
Build the tree.
- Parameters
- alignmentMultipleSeqAlignment
multiple sequence alignment to calculate parsimony tree.