Bio.Phylo.BaseTree module

Base classes for Bio.Phylo objects.

All object representations for phylogenetic trees should derive from these base classes in order to use the common methods defined on them.

class Bio.Phylo.BaseTree.TreeElement

Bases: object

Base class for all Bio.Phylo classes.

__repr__(self)

Show this object’s constructor with its primitive arguments.

__str__(self)

Show this object’s constructor with its primitive arguments.

class Bio.Phylo.BaseTree.TreeMixin

Bases: object

Methods for Tree- and Clade-based classes.

This lets Tree and Clade support the same traversal and searching operations without requiring Clade to inherit from Tree, so Clade isn’t required to have all of Tree’s attributes – just root (a Clade instance) and is_terminal.

find_any(self, *args, **kwargs)

Return the first element found by find_elements(), or None.

This is also useful for checking whether any matching element exists in the tree, and can be used in a conditional expression.

find_elements(self, target=None, terminal=None, order='preorder', **kwargs)

Find all tree elements matching the given attributes.

The arbitrary keyword arguments indicate the attribute name of the sub-element and the value to match: string, integer or boolean. Strings are evaluated as regular expression matches; integers are compared directly for equality, and booleans evaluate the attribute’s truth value (True or False) before comparing. To handle nonzero floats, search with a boolean argument, then filter the result manually.

If no keyword arguments are given, then just the class type is used for matching.

The result is an iterable through all matching objects, by depth-first search. (Not necessarily the same order as the elements appear in the source file!)

Parameters
targetTreeElement instance, type, dict, or callable

Specifies the characteristics to search for. (The default, TreeElement, matches any standard Bio.Phylo type.)

terminalbool

A boolean value to select for or against terminal nodes (a.k.a. leaf nodes). True searches for only terminal nodes, False excludes terminal nodes, and the default, None, searches both terminal and non-terminal nodes, as well as any tree elements lacking the is_terminal method.

order{‘preorder’, ‘postorder’, ‘level’}

Tree traversal order: ‘preorder’ (default) is depth-first search, ‘postorder’ is DFS with child nodes preceding parents, and ‘level’ is breadth-first search.

Examples

>>> from Bio import Phylo
>>> phx = Phylo.PhyloXMLIO.read('PhyloXML/phyloxml_examples.xml')
>>> matches = phx.phylogenies[5].find_elements(code='OCTVU')
>>> next(matches)
Taxonomy(code='OCTVU', scientific_name='Octopus vulgaris')
find_clades(self, target=None, terminal=None, order='preorder', **kwargs)

Find each clade containing a matching element.

That is, find each element as with find_elements(), but return the corresponding clade object. (This is usually what you want.)

Returns

an iterable through all matching objects, searching depth-first (preorder) by default.

get_path(self, target=None, **kwargs)

List the clades directly between this root and the given target.

Returns

list of all clade objects along this path, ending with the given target, but excluding the root clade.

get_nonterminals(self, order='preorder')

Get a list of all of this tree’s nonterminal (internal) nodes.

get_terminals(self, order='preorder')

Get a list of all of this tree’s terminal (leaf) nodes.

trace(self, start, finish)

List of all clade object between two targets in this tree.

Excluding start, including finish.

common_ancestor(self, targets, *more_targets)

Most recent common ancestor (clade) of all the given targets.

Edge cases:
  • If no target is given, returns self.root

  • If 1 target is given, returns the target

  • If any target is not found in this tree, raises a ValueError

count_terminals(self)

Count the number of terminal (leaf) nodes within this tree.

depths(self, unit_branch_lengths=False)

Create a mapping of tree clades to depths (by branch length).

Parameters
unit_branch_lengthsbool

If True, count only the number of branches (levels in the tree). By default the distance is the cumulative branch length leading to the clade.

Returns

dict of {clade: depth}, where keys are all of the Clade instances in the tree, and values are the distance from the root to each clade (including terminals).

distance(self, target1, target2=None)

Calculate the sum of the branch lengths between two targets.

If only one target is specified, the other is the root of this tree.

is_bifurcating(self)

Return True if tree downstream of node is strictly bifurcating.

I.e., all nodes have either 2 or 0 children (internal or external, respectively). The root may have 3 descendents and still be considered part of a bifurcating tree, because it has no ancestor.

is_monophyletic(self, terminals, *more_terminals)

MRCA of terminals if they comprise a complete subclade, or False.

I.e., there exists a clade such that its terminals are the same set as the given targets.

The given targets must be terminals of the tree.

To match both Bio.Nexus.Trees and the other multi-target methods in Bio.Phylo, arguments to this method can be specified either of two ways: (i) as a single list of targets, or (ii) separately specified targets, e.g. is_monophyletic(t1, t2, t3) – but not both.

For convenience, this method returns the common ancestor (MCRA) of the targets if they are monophyletic (instead of the value True), and False otherwise.

Returns

common ancestor if terminals are monophyletic, otherwise False.

is_parent_of(self, target=None, **kwargs)

Check if target is a descendent of this tree.

Not required to be a direct descendent.

To check only direct descendents of a clade, simply use list membership testing: if subclade in clade: ...

is_preterminal(self)

Check if all direct descendents are terminal.

total_branch_length(self)

Calculate the sum of all the branch lengths in this tree.

collapse(self, target=None, **kwargs)

Delete target from the tree, relinking its children to its parent.

Returns

the parent clade.

collapse_all(self, target=None, **kwargs)

Collapse all the descendents of this tree, leaving only terminals.

Total branch lengths are preserved, i.e. the distance to each terminal stays the same.

For example, this will safely collapse nodes with poor bootstrap support:

>>> from Bio import Phylo
>>> tree = Phylo.read('PhyloXML/apaf.xml', 'phyloxml')
>>> print("Total branch length %0.2f" % tree.total_branch_length())
Total branch length 20.44
>>> tree.collapse_all(lambda c: c.confidence is not None and c.confidence < 70)
>>> print("Total branch length %0.2f" % tree.total_branch_length())
Total branch length 21.37

This implementation avoids strange side-effects by using level-order traversal and testing all clade properties (versus the target specification) up front. In particular, if a clade meets the target specification in the original tree, it will be collapsed. For example, if the condition is:

>>> from Bio import Phylo
>>> tree = Phylo.read('PhyloXML/apaf.xml', 'phyloxml')
>>> print("Total branch length %0.2f" % tree.total_branch_length())
Total branch length 20.44
>>> tree.collapse_all(lambda c: c.branch_length < 0.1)
>>> print("Total branch length %0.2f" % tree.total_branch_length())
Total branch length 21.13

Collapsing a clade’s parent node adds the parent’s branch length to the child, so during the execution of collapse_all, a clade’s branch_length may increase. In this implementation, clades are collapsed according to their properties in the original tree, not the properties when tree traversal reaches the clade. (It’s easier to debug.) If you want the other behavior (incremental testing), modifying the source code of this function is straightforward.

ladderize(self, reverse=False)

Sort clades in-place according to the number of terminal nodes.

Deepest clades are last by default. Use reverse=True to sort clades deepest-to-shallowest.

prune(self, target=None, **kwargs)

Prunes a terminal clade from the tree.

If taxon is from a bifurcation, the connecting node will be collapsed and its branch length added to remaining terminal node. This might be no longer be a meaningful value.

Returns

parent clade of the pruned target

split(self, n=2, branch_length=1.0)

Generate n (default 2) new descendants.

In a species tree, this is a speciation event.

New clades have the given branch_length and the same name as this clade’s root plus an integer suffix (counting from 0). For example, splitting a clade named “A” produces sub-clades named “A0” and “A1”. If the clade has no name, the prefix “n” is used for child nodes, e.g. “n0” and “n1”.

class Bio.Phylo.BaseTree.Tree(root=None, rooted=True, id=None, name=None)

Bases: Bio.Phylo.BaseTree.TreeElement, Bio.Phylo.BaseTree.TreeMixin

A phylogenetic tree, containing global info for the phylogeny.

The structure and node-specific data is accessible through the ‘root’ clade attached to the Tree instance.

Parameters
rootClade

The starting node of the tree. If the tree is rooted, this will usually be the root node.

rootedbool

Whether or not the tree is rooted. By default, a tree is assumed to be rooted.

idstr

The identifier of the tree, if there is one.

namestr

The name of the tree, in essence a label.

__init__(self, root=None, rooted=True, id=None, name=None)

Initialize parameter for phylogenetic tree.

classmethod from_clade(clade, **kwargs)

Create a new Tree object given a clade.

Keyword arguments are the usual Tree constructor parameters.

classmethod randomized(taxa, branch_length=1.0, branch_stdev=None)

Create a randomized bifurcating tree given a list of taxa.

Parameters

taxa – Either an integer specifying the number of taxa to create (automatically named taxon#), or an iterable of taxon names, as strings.

Returns

a tree of the same type as this class.

property clade

Return first clade in this tree (not itself).

as_phyloxml(self, **kwargs)

Convert this tree to a PhyloXML-compatible Phylogeny.

This lets you use the additional annotation types PhyloXML defines, and save this information when you write this tree as ‘phyloxml’.

root_with_outgroup(self, outgroup_targets, *more_targets, **kwargs)

Reroot this tree with the outgroup clade containing outgroup_targets.

Operates in-place.

Edge cases:
  • If outgroup == self.root, no change

  • If outgroup is terminal, create new bifurcating root node with a 0-length branch to the outgroup

  • If outgroup is internal, use the given outgroup node as the new trifurcating root, keeping branches the same

  • If the original root was bifurcating, drop it from the tree, preserving total branch lengths

Parameters

outgroup_branch_length

length of the branch leading to the outgroup after rerooting. If not specified (None), then:

  • If the outgroup is an internal node (not a single terminal taxon), then use that node as the new root.

  • Otherwise, create a new root node as the parent of the outgroup.

root_at_midpoint(self)

Root the tree at the midpoint of the two most distant taxa.

This operates in-place, leaving a bifurcating root. The topology of the tree is otherwise retained, though no guarantees are made about the stability of clade/node/taxon ordering.

is_terminal(self)

Check if the root of this tree is terminal.

__format__(self, format_spec)

Serialize the tree as a string in the specified file format.

This method supports the format built-in function added in Python 2.6/3.0.

Parameters

format_spec – a lower-case string supported by Bio.Phylo.write as an output file format.

format(self, format)

Serialize the tree as a string in the specified file format.

This duplicates the __format__ magic method for pre-2.6 Pythons.

__str__(self)

Return a string representation of the entire tree.

Serialize each sub-clade recursively using repr to create a summary of the object structure.

class Bio.Phylo.BaseTree.Clade(branch_length=None, name=None, clades=None, confidence=None, color=None, width=None)

Bases: Bio.Phylo.BaseTree.TreeElement, Bio.Phylo.BaseTree.TreeMixin

A recursively defined sub-tree.

Parameters
branch_lengthstr

The length of the branch leading to the root node of this clade.

namestr

The clade’s name (a label).

cladeslist

Sub-trees rooted directly under this tree’s root.

confidencenumber

Support.

colorBranchColor

The display color of the branch and descendents.

widthnumber

The display width of the branch and descendents.

__init__(self, branch_length=None, name=None, clades=None, confidence=None, color=None, width=None)

Define parameters for the Clade tree.

property root

Allow TreeMixin methods to traverse clades properly.

is_terminal(self)

Check if this is a terminal (leaf) node.

__getitem__(self, index)

Get clades by index (integer or slice).

__iter__(self)

Iterate through this tree’s direct descendent clades (sub-trees).

__len__(self)

Return the number of clades directy under the root.

__bool__(self)

Boolean value of an instance of this class (True).

NB: If this method is not defined, but __len__ is, then the object is considered true if the result of __len__() is nonzero. We want Clade instances to always be considered True.

__nonzero__(self)

Boolean value of an instance of this class (True).

NB: If this method is not defined, but __len__ is, then the object is considered true if the result of __len__() is nonzero. We want Clade instances to always be considered True.

__str__(self)

Return name of the class instance.

property color

Branch color.

class Bio.Phylo.BaseTree.BranchColor(red, green, blue)

Bases: object

Indicates the color of a clade when rendered graphically.

The color should be interpreted by client code (e.g. visualization programs) as applying to the whole clade, unless overwritten by the color(s) of sub-clades.

Color values must be integers from 0 to 255.

color_names = {'aqua': (0, 255, 255), 'b': (0, 0, 255), 'black': (0, 0, 0), 'blue': (0, 0, 255), 'brown': (165, 42, 42), 'c': (0, 255, 255), 'cyan': (0, 255, 255), 'fuchsia': (255, 0, 255), 'g': (0, 128, 0), 'gold': (255, 215, 0), 'gray': (128, 128, 128), 'green': (0, 128, 0), 'grey': (128, 128, 128), 'k': (0, 0, 0), 'lime': (0, 255, 0), 'm': (255, 0, 255), 'magenta': (255, 0, 255), 'maroon': (128, 0, 0), 'navy': (0, 0, 128), 'olive': (128, 128, 0), 'orange': (255, 165, 0), 'pink': (255, 192, 203), 'purple': (128, 0, 128), 'r': (255, 0, 0), 'red': (255, 0, 0), 'salmon': (250, 128, 114), 'silver': (192, 192, 192), 'tan': (210, 180, 140), 'teal': (0, 128, 128), 'w': (255, 255, 255), 'white': (255, 255, 255), 'y': (255, 255, 0), 'yellow': (255, 255, 0)}
__init__(self, red, green, blue)

Initialize BranchColor for a tree.

classmethod from_hex(hexstr)

Construct a BranchColor object from a hexadecimal string.

The string format is the same style used in HTML and CSS, such as ‘#FF8000’ for an RGB value of (255, 128, 0).

classmethod from_name(colorname)

Construct a BranchColor object by the color’s name.

to_hex(self)

Return a 24-bit hexadecimal RGB representation of this color.

The returned string is suitable for use in HTML/CSS, as a color parameter in matplotlib, and perhaps other situations.

Examples

>>> bc = BranchColor(12, 200, 100)
>>> bc.to_hex()
'#0cc864'
to_rgb(self)

Return a tuple of RGB values (0 to 255) representing this color.

Examples

>>> bc = BranchColor(255, 165, 0)
>>> bc.to_rgb()
(255, 165, 0)
__repr__(self)

Preserve the standard RGB order when representing this object.

__str__(self)

Show the color’s RGB values.