Bio.Phylo.TreeConstruction module
Classes and methods for tree construction.
- class Bio.Phylo.TreeConstruction.DistanceMatrix(names, matrix=None)
Bases:
_Matrix
Distance matrix class that can be used for distance based tree algorithms.
All diagonal elements will be zero no matter what the users provide.
- __init__(names, matrix=None)
Initialize the class.
- __setitem__(item, value)
Set Matrix’s items to values.
- format_phylip(handle)
Write data in Phylip format to a given file-like object or handle.
The output stream is the input distance matrix format used with Phylip programs (e.g. ‘neighbor’). See: http://evolution.genetics.washington.edu/phylip/doc/neighbor.html
- Parameters:
- handlefile or file-like object
A writeable text mode file handle or other object supporting the ‘write’ method, such as StringIO or sys.stdout.
- class Bio.Phylo.TreeConstruction.DistanceCalculator(model='identity', skip_letters=None)
Bases:
object
Calculates the distance matrix from a DNA or protein sequence alignment.
This class calculates the distance matrix from a multiple sequence alignment of DNA or protein sequences, and the given name of the substitution model.
Currently only scoring matrices are used.
- Parameters:
- modelstr
Name of the model matrix to be used to calculate distance. The attribute
dna_models
contains the available model names for DNA sequences andprotein_models
for protein sequences.
Examples
Loading a small PHYLIP alignment from which to compute distances:
>>> from Bio.Phylo.TreeConstruction import DistanceCalculator >>> from Bio import AlignIO >>> aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip') >>> print(aln) Alignment with 5 rows and 13 columns AACGTGGCCACAT Alpha AAGGTCGCCACAC Beta CAGTTCGCCACAA Gamma GAGATTTCCGCCT Delta GAGATCTCCGCCC Epsilon
DNA calculator with ‘identity’ model:
>>> calculator = DistanceCalculator('identity') >>> dm = calculator.get_distance(aln) >>> print(dm) Alpha 0.000000 Beta 0.230769 0.000000 Gamma 0.384615 0.230769 0.000000 Delta 0.538462 0.538462 0.538462 0.000000 Epsilon 0.615385 0.384615 0.461538 0.153846 0.000000 Alpha Beta Gamma Delta Epsilon
Protein calculator with ‘blosum62’ model:
>>> calculator = DistanceCalculator('blosum62') >>> dm = calculator.get_distance(aln) >>> print(dm) Alpha 0.000000 Beta 0.369048 0.000000 Gamma 0.493976 0.250000 0.000000 Delta 0.585366 0.547619 0.566265 0.000000 Epsilon 0.700000 0.355556 0.488889 0.222222 0.000000 Alpha Beta Gamma Delta Epsilon
Same calculation, using the new Alignment object:
>>> from Bio.Phylo.TreeConstruction import DistanceCalculator >>> from Bio import Align >>> aln = Align.read('TreeConstruction/msa.phy', 'phylip') >>> print(aln) Alpha 0 AACGTGGCCACAT 13 Beta 0 AAGGTCGCCACAC 13 Gamma 0 CAGTTCGCCACAA 13 Delta 0 GAGATTTCCGCCT 13 Epsilon 0 GAGATCTCCGCCC 13
DNA calculator with ‘identity’ model:
>>> calculator = DistanceCalculator('identity') >>> dm = calculator.get_distance(aln) >>> print(dm) Alpha 0.000000 Beta 0.230769 0.000000 Gamma 0.384615 0.230769 0.000000 Delta 0.538462 0.538462 0.538462 0.000000 Epsilon 0.615385 0.384615 0.461538 0.153846 0.000000 Alpha Beta Gamma Delta Epsilon
Protein calculator with ‘blosum62’ model:
>>> calculator = DistanceCalculator('blosum62') >>> dm = calculator.get_distance(aln) >>> print(dm) Alpha 0.000000 Beta 0.369048 0.000000 Gamma 0.493976 0.250000 0.000000 Delta 0.585366 0.547619 0.566265 0.000000 Epsilon 0.700000 0.355556 0.488889 0.222222 0.000000 Alpha Beta Gamma Delta Epsilon
- dna_models = ['benner22', 'benner6', 'benner74', 'blastn', 'dayhoff', 'feng', 'genetic', 'gonnet1992', 'hoxd70', 'johnson', 'jones', 'levin', 'mclachlan', 'mdm78', 'megablast', 'blastn', 'rao', 'risler', 'schneider', 'str', 'trans']
- protein_models = ['blastp', 'blosum45', 'blosum50', 'blosum62', 'blosum80', 'blosum90', 'pam250', 'pam30', 'pam70']
- models = ['identity', 'benner22', 'benner6', 'benner74', 'blastn', 'dayhoff', 'feng', 'genetic', 'gonnet1992', 'hoxd70', 'johnson', 'jones', 'levin', 'mclachlan', 'mdm78', 'megablast', 'blastn', 'rao', 'risler', 'schneider', 'str', 'trans', 'blastp', 'blosum45', 'blosum50', 'blosum62', 'blosum80', 'blosum90', 'pam250', 'pam30', 'pam70']
- __init__(model='identity', skip_letters=None)
Initialize with a distance model.
- get_distance(msa)
Return a DistanceMatrix for an Alignment or MultipleSeqAlignment object.
- Parameters:
- msaAlignment or MultipleSeqAlignment object representing a
DNA or protein multiple sequence alignment.
- class Bio.Phylo.TreeConstruction.TreeConstructor
Bases:
object
Base class for all tree constructor.
- build_tree(msa)
Caller to build the tree from an Alignment or MultipleSeqAlignment object.
This should be implemented in subclass.
- class Bio.Phylo.TreeConstruction.DistanceTreeConstructor(distance_calculator=None, method='nj')
Bases:
TreeConstructor
Distance based tree constructor.
- Parameters:
- methodstr
Distance tree construction method, ‘nj’(default) or ‘upgma’.
- distance_calculatorDistanceCalculator
The distance matrix calculator for multiple sequence alignment. It must be provided if
build_tree
will be called.
Examples
Loading a small PHYLIP alignment from which to compute distances, and then build a upgma Tree:
>>> from Bio.Phylo.TreeConstruction import DistanceTreeConstructor >>> from Bio.Phylo.TreeConstruction import DistanceCalculator >>> from Bio import AlignIO >>> aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip') >>> constructor = DistanceTreeConstructor() >>> calculator = DistanceCalculator('identity') >>> dm = calculator.get_distance(aln) >>> upgmatree = constructor.upgma(dm) >>> print(upgmatree) Tree(rooted=True) Clade(branch_length=0, name='Inner4') Clade(branch_length=0.18749999999999994, name='Inner1') Clade(branch_length=0.07692307692307693, name='Epsilon') Clade(branch_length=0.07692307692307693, name='Delta') Clade(branch_length=0.11057692307692304, name='Inner3') Clade(branch_length=0.038461538461538464, name='Inner2') Clade(branch_length=0.11538461538461536, name='Gamma') Clade(branch_length=0.11538461538461536, name='Beta') Clade(branch_length=0.15384615384615383, name='Alpha')
Build a NJ Tree:
>>> njtree = constructor.nj(dm) >>> print(njtree) Tree(rooted=False) Clade(branch_length=0, name='Inner3') Clade(branch_length=0.18269230769230765, name='Alpha') Clade(branch_length=0.04807692307692307, name='Beta') Clade(branch_length=0.04807692307692307, name='Inner2') Clade(branch_length=0.27884615384615385, name='Inner1') Clade(branch_length=0.051282051282051266, name='Epsilon') Clade(branch_length=0.10256410256410259, name='Delta') Clade(branch_length=0.14423076923076922, name='Gamma')
Same example, using the new Alignment class:
>>> from Bio.Phylo.TreeConstruction import DistanceTreeConstructor >>> from Bio.Phylo.TreeConstruction import DistanceCalculator >>> from Bio import Align >>> aln = Align.read(open('TreeConstruction/msa.phy'), 'phylip') >>> constructor = DistanceTreeConstructor() >>> calculator = DistanceCalculator('identity') >>> dm = calculator.get_distance(aln) >>> upgmatree = constructor.upgma(dm) >>> print(upgmatree) Tree(rooted=True) Clade(branch_length=0, name='Inner4') Clade(branch_length=0.18749999999999994, name='Inner1') Clade(branch_length=0.07692307692307693, name='Epsilon') Clade(branch_length=0.07692307692307693, name='Delta') Clade(branch_length=0.11057692307692304, name='Inner3') Clade(branch_length=0.038461538461538464, name='Inner2') Clade(branch_length=0.11538461538461536, name='Gamma') Clade(branch_length=0.11538461538461536, name='Beta') Clade(branch_length=0.15384615384615383, name='Alpha')
Build a NJ Tree:
>>> njtree = constructor.nj(dm) >>> print(njtree) Tree(rooted=False) Clade(branch_length=0, name='Inner3') Clade(branch_length=0.18269230769230765, name='Alpha') Clade(branch_length=0.04807692307692307, name='Beta') Clade(branch_length=0.04807692307692307, name='Inner2') Clade(branch_length=0.27884615384615385, name='Inner1') Clade(branch_length=0.051282051282051266, name='Epsilon') Clade(branch_length=0.10256410256410259, name='Delta') Clade(branch_length=0.14423076923076922, name='Gamma')
- methods = ['nj', 'upgma']
- __init__(distance_calculator=None, method='nj')
Initialize the class.
- build_tree(msa)
Construct and return a Tree, Neighbor Joining or UPGMA.
- upgma(distance_matrix)
Construct and return an UPGMA tree.
Constructs and returns an Unweighted Pair Group Method with Arithmetic mean (UPGMA) tree.
- Parameters:
- distance_matrixDistanceMatrix
The distance matrix for tree construction.
- nj(distance_matrix)
Construct and return a Neighbor Joining tree.
- Parameters:
- distance_matrixDistanceMatrix
The distance matrix for tree construction.
- class Bio.Phylo.TreeConstruction.Scorer
Bases:
object
Base class for all tree scoring methods.
- get_score(tree, alignment)
Caller to get the score of a tree for the given alignment.
This should be implemented in subclass.
- class Bio.Phylo.TreeConstruction.TreeSearcher
Bases:
object
Base class for all tree searching methods.
- search(starting_tree, alignment)
Caller to search the best tree with a starting tree.
This should be implemented in subclass.
- class Bio.Phylo.TreeConstruction.NNITreeSearcher(scorer)
Bases:
TreeSearcher
Tree searching with Nearest Neighbor Interchanges (NNI) algorithm.
- Parameters:
- scorerParsimonyScorer
parsimony scorer to calculate the parsimony score of different trees during NNI algorithm.
- __init__(scorer)
Initialize the class.
- search(starting_tree, alignment)
Implement the TreeSearcher.search method.
- Parameters:
- starting_treeTree
starting tree of NNI method.
- alignmentAlignment or MultipleSeqAlignment object
multiple sequence alignment used to calculate parsimony score of different NNI trees.
- class Bio.Phylo.TreeConstruction.ParsimonyScorer(matrix=None)
Bases:
Scorer
Parsimony scorer with a scoring matrix.
This is a combination of Fitch algorithm and Sankoff algorithm. See ParsimonyTreeConstructor for usage.
- Parameters:
- matrix_Matrix
scoring matrix used in parsimony score calculation.
- __init__(matrix=None)
Initialize the class.
- get_score(tree, alignment)
Calculate parsimony score using the Fitch algorithm.
Calculate and return the parsimony score given a tree and the MSA using either the Fitch algorithm (without a penalty matrix) or the Sankoff algorithm (with a matrix).
- class Bio.Phylo.TreeConstruction.ParsimonyTreeConstructor(searcher, starting_tree=None)
Bases:
TreeConstructor
Parsimony tree constructor.
- Parameters:
- searcherTreeSearcher
tree searcher to search the best parsimony tree.
- starting_treeTree
starting tree provided to the searcher.
Examples
We will load an alignment, and then load various trees which have already been computed from it:
>>> from Bio import AlignIO, Phylo >>> aln = AlignIO.read(open('TreeConstruction/msa.phy'), 'phylip') >>> print(aln) Alignment with 5 rows and 13 columns AACGTGGCCACAT Alpha AAGGTCGCCACAC Beta CAGTTCGCCACAA Gamma GAGATTTCCGCCT Delta GAGATCTCCGCCC Epsilon
Load a starting tree:
>>> starting_tree = Phylo.read('TreeConstruction/nj.tre', 'newick') >>> print(starting_tree) Tree(rooted=False, weight=1.0) Clade(branch_length=0.0, name='Inner3') Clade(branch_length=0.01421, name='Inner2') Clade(branch_length=0.23927, name='Inner1') Clade(branch_length=0.08531, name='Epsilon') Clade(branch_length=0.13691, name='Delta') Clade(branch_length=0.2923, name='Alpha') Clade(branch_length=0.07477, name='Beta') Clade(branch_length=0.17523, name='Gamma')
Build the Parsimony tree from the starting tree:
>>> scorer = Phylo.TreeConstruction.ParsimonyScorer() >>> searcher = Phylo.TreeConstruction.NNITreeSearcher(scorer) >>> constructor = Phylo.TreeConstruction.ParsimonyTreeConstructor(searcher, starting_tree) >>> pars_tree = constructor.build_tree(aln) >>> print(pars_tree) Tree(rooted=True, weight=1.0) Clade(branch_length=0.0) Clade(branch_length=0.19732999999999998, name='Inner1') Clade(branch_length=0.13691, name='Delta') Clade(branch_length=0.08531, name='Epsilon') Clade(branch_length=0.04194000000000003, name='Inner2') Clade(branch_length=0.01421, name='Inner3') Clade(branch_length=0.17523, name='Gamma') Clade(branch_length=0.07477, name='Beta') Clade(branch_length=0.2923, name='Alpha')
Same example, using the new Alignment class:
>>> from Bio import Align, Phylo >>> alignment = Align.read(open('TreeConstruction/msa.phy'), 'phylip') >>> print(alignment) Alpha 0 AACGTGGCCACAT 13 Beta 0 AAGGTCGCCACAC 13 Gamma 0 CAGTTCGCCACAA 13 Delta 0 GAGATTTCCGCCT 13 Epsilon 0 GAGATCTCCGCCC 13
Load a starting tree:
>>> starting_tree = Phylo.read('TreeConstruction/nj.tre', 'newick') >>> print(starting_tree) Tree(rooted=False, weight=1.0) Clade(branch_length=0.0, name='Inner3') Clade(branch_length=0.01421, name='Inner2') Clade(branch_length=0.23927, name='Inner1') Clade(branch_length=0.08531, name='Epsilon') Clade(branch_length=0.13691, name='Delta') Clade(branch_length=0.2923, name='Alpha') Clade(branch_length=0.07477, name='Beta') Clade(branch_length=0.17523, name='Gamma')
Build the Parsimony tree from the starting tree:
>>> scorer = Phylo.TreeConstruction.ParsimonyScorer() >>> searcher = Phylo.TreeConstruction.NNITreeSearcher(scorer) >>> constructor = Phylo.TreeConstruction.ParsimonyTreeConstructor(searcher, starting_tree) >>> pars_tree = constructor.build_tree(alignment) >>> print(pars_tree) Tree(rooted=True, weight=1.0) Clade(branch_length=0.0) Clade(branch_length=0.19732999999999998, name='Inner1') Clade(branch_length=0.13691, name='Delta') Clade(branch_length=0.08531, name='Epsilon') Clade(branch_length=0.04194000000000003, name='Inner2') Clade(branch_length=0.01421, name='Inner3') Clade(branch_length=0.17523, name='Gamma') Clade(branch_length=0.07477, name='Beta') Clade(branch_length=0.2923, name='Alpha')
- __init__(searcher, starting_tree=None)
Initialize the class.
- build_tree(alignment)
Build the tree.
- Parameters:
- alignmentMultipleSeqAlignment
multiple sequence alignment to calculate parsimony tree.