Bio.Phylo.PhyloXML module

Classes corresponding to phyloXML elements.

See Also

Official specification:

http://phyloxml.org/

Journal article:

Han and Zmasek (2009), https://doi.org/10.1186/1471-2105-10-356

exception Bio.Phylo.PhyloXML.PhyloXMLWarning

Bases: Bio.BiopythonWarning

Warning for non-compliance with the phyloXML specification.

class Bio.Phylo.PhyloXML.PhyloElement

Bases: Bio.Phylo.BaseTree.TreeElement

Base class for all PhyloXML objects.

class Bio.Phylo.PhyloXML.Phyloxml(attributes, phylogenies=None, other=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Root node of the PhyloXML document.

Contains an arbitrary number of Phylogeny elements, possibly followed by elements from other namespaces.

Parameters
attributesdict

(XML namespace definitions)

phylogenieslist

The phylogenetic trees

otherlist

Arbitrary non-phyloXML elements, if any

__init__(attributes, phylogenies=None, other=None)

Initialize parameters for PhyloXML object.

__getitem__(index)

Get a phylogeny by index or name.

__iter__()

Iterate through the phylogenetic trees in this object.

__len__()

Return the number of phylogenetic trees in this object.

__str__()

Return name of phylogenies in the object.

class Bio.Phylo.PhyloXML.Other(tag, namespace=None, attributes=None, value=None, children=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Container for non-phyloXML elements in the tree.

Usually, an Other object will have either a ‘value’ or a non-empty list of ‘children’, but not both. This is not enforced here, though.

Parameters
tagstring

local tag for the XML node

namespacestring

XML namespace for the node – should not be the default phyloXML namespace.

attributesdict of strings

attributes on the XML node

valuestring

text contained directly within this XML node

childrenlist

child nodes, if any (also Other instances)

__init__(tag, namespace=None, attributes=None, value=None, children=None)

Initialize values for non-phyloXML elements.

__iter__()

Iterate through the children of this object (if any).

class Bio.Phylo.PhyloXML.Phylogeny(root=None, rooted=True, rerootable=None, branch_length_unit=None, type=None, name=None, id=None, description=None, date=None, confidences=None, clade_relations=None, sequence_relations=None, properties=None, other=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement, Bio.Phylo.BaseTree.Tree

A phylogenetic tree.

Parameters
rootClade

the root node/clade of this tree

rootedbool

True if this tree is rooted

rerootablebool

True if this tree is rerootable

branch_length_unitstring

unit for branch_length values on clades

namestring

identifier for this tree, not required to be unique

idId

unique identifier for this tree

descriptionstring

plain-text description

dateDate

date for the root node of this tree

confidenceslist

Confidence objects for this tree

clade_relationslist

CladeRelation objects

sequence_relationslist

SequenceRelation objects

propertieslist

Property objects

otherlist

non-phyloXML elements (type Other)

__init__(root=None, rooted=True, rerootable=None, branch_length_unit=None, type=None, name=None, id=None, description=None, date=None, confidences=None, clade_relations=None, sequence_relations=None, properties=None, other=None)

Initialize values for phylogenetic tree object.

classmethod from_tree(tree, **kwargs)

Create a new Phylogeny given a Tree (from Newick/Nexus or BaseTree).

Keyword arguments are the usual Phylogeny constructor parameters.

classmethod from_clade(clade, **kwargs)

Create a new Phylogeny given a Newick or BaseTree Clade object.

Keyword arguments are the usual PhyloXML.Clade constructor parameters.

as_phyloxml()

Return this tree, a PhyloXML-compatible Phylogeny object.

Overrides the BaseTree method.

to_phyloxml_container(**kwargs)

Create a new Phyloxml object containing just this phylogeny.

to_alignment()

Construct a MultipleSeqAlignment from the aligned sequences in this tree.

property alignment

Construct an Alignment object from the aligned sequences in this tree.

property confidence

Equivalent to self.confidences[0] if there is only 1 value (PRIVATE).

See Also: Clade.confidence, Clade.taxonomy

class Bio.Phylo.PhyloXML.Clade(branch_length=None, id_source=None, name=None, width=None, color=None, node_id=None, events=None, binary_characters=None, date=None, confidences=None, taxonomies=None, sequences=None, distributions=None, references=None, properties=None, clades=None, other=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement, Bio.Phylo.BaseTree.Clade

Describes a branch of the current phylogenetic tree.

Used recursively, describes the topology of a phylogenetic tree.

Both color and width elements should be interpreted by client code as applying to the whole clade, including all descendents, unless overwritten in-sub clades. This module doesn’t automatically assign these attributes to sub-clades to achieve this cascade – and neither should you.

Parameters
branch_length

parent branch length of this clade

id_source

link other elements to a clade (on the xml-level)

namestring

short label for this clade

confidenceslist of Confidence objects

used to indicate the support for a clade/parent branch.

widthfloat

branch width for this clade (including branch from parent)

colorBranchColor

color used for graphical display of this clade

node_id

unique identifier for the root node of this clade

taxonomieslist

Taxonomy objects

sequenceslist

Sequence objects

eventsEvents

describe such events as gene-duplications at the root node/parent branch of this clade

binary_charactersBinaryCharacters

binary characters

distributionslist of Distribution objects

distribution(s) of this clade

dateDate

a date for the root node of this clade

referenceslist

Reference objects

propertieslist

Property objects

cladeslist Clade objects

Sub-clades

otherlist of Other objects

non-phyloXML objects

__init__(branch_length=None, id_source=None, name=None, width=None, color=None, node_id=None, events=None, binary_characters=None, date=None, confidences=None, taxonomies=None, sequences=None, distributions=None, references=None, properties=None, clades=None, other=None)

Initialize value for the Clade object.

classmethod from_clade(clade, **kwargs)

Create a new PhyloXML Clade from a Newick or BaseTree Clade object.

Keyword arguments are the usual PhyloXML Clade constructor parameters.

to_phylogeny(**kwargs)

Create a new phylogeny containing just this clade.

property confidence

Return confidence values (PRIVATE).

property taxonomy

Get taxonomy list for the clade (PRIVATE).

class Bio.Phylo.PhyloXML.BranchColor(*args, **kwargs)

Bases: Bio.Phylo.PhyloXML.PhyloElement, Bio.Phylo.BaseTree.BranchColor

Manage Tree branch’s color.

__init__(*args, **kwargs)

Initialize parameters for the BranchColor object.

class Bio.Phylo.PhyloXML.Accession(value, source)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Captures the local part in a sequence identifier.

Example: In UniProtKB:P17304, the Accession instance attribute value is ‘P17304’ and the source attribute is ‘UniProtKB’.

__init__(value, source)

Initialize value for Accession object.

__str__()

Show the class name and an identifying attribute.

class Bio.Phylo.PhyloXML.Annotation(ref=None, source=None, evidence=None, type=None, desc=None, confidence=None, uri=None, properties=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

The annotation of a molecular sequence.

It is recommended to annotate by using the optional ‘ref’ attribute.

Parameters
refstring

reference string, e.g. ‘GO:0008270’, ‘KEGG:Tetrachloroethene degradation’, ‘EC:1.1.1.1’

sourcestring

plain-text source for this annotation

evidencestr

describe evidence as free text (e.g. ‘experimental’)

descstring

free text description

confidenceConfidence

state the type and value of support (type Confidence)

propertieslist

typed and referenced annotations from external resources

uriUri

link

re_ref = re.compile('[a-zA-Z0-9_]+:[a-zA-Z0-9_\\.\\-\\s]+')
__init__(ref=None, source=None, evidence=None, type=None, desc=None, confidence=None, uri=None, properties=None)

Initialize value for the Annotation object.

class Bio.Phylo.PhyloXML.BinaryCharacters(type=None, gained_count=None, lost_count=None, present_count=None, absent_count=None, gained=None, lost=None, present=None, absent=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Binary characters at the root of a clade.

The names and/or counts of binary characters present, gained, and lost at the root of a clade.

__init__(type=None, gained_count=None, lost_count=None, present_count=None, absent_count=None, gained=None, lost=None, present=None, absent=None)

Initialize values for the BinaryCharacters object.

class Bio.Phylo.PhyloXML.CladeRelation(type, id_ref_0, id_ref_1, distance=None, confidence=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Expresses a typed relationship between two clades.

For example, this could be used to describe multiple parents of a clade.

__init__(type, id_ref_0, id_ref_1, distance=None, confidence=None)

Initialize values for the CladeRelation object.

class Bio.Phylo.PhyloXML.Confidence(value, type='unknown')

Bases: float, Bio.Phylo.PhyloXML.PhyloElement

A general purpose confidence element.

For example, this can be used to express the bootstrap support value of a clade (in which case the type attribute is ‘bootstrap’).

Parameters
valuefloat

confidence value

typestring

label for the type of confidence, e.g. ‘bootstrap’

static __new__(cls, value, type='unknown')

Create and return a Confidence object with the specified value and type.

property value

Return the float value of the Confidence object.

class Bio.Phylo.PhyloXML.Date(value=None, unit=None, desc=None, minimum=None, maximum=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

A date associated with a clade/node.

Its value can be numerical by using the ‘value’ element and/or free text with the ‘desc’ element’ (e.g. ‘Silurian’). If a numerical value is used, it is recommended to employ the ‘unit’ attribute.

Parameters
unitstring

type of numerical value (e.g. ‘mya’ for ‘million years ago’)

valuefloat

the date value

descstring

plain-text description of the date

minimumfloat

lower bound on the date value

maximumfloat

upper bound on the date value

__init__(value=None, unit=None, desc=None, minimum=None, maximum=None)

Initialize values of the Date object.

__str__()

Show the class name and the human-readable date.

class Bio.Phylo.PhyloXML.Distribution(desc=None, points=None, polygons=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Geographic distribution of the items of a clade (species, sequences).

Intended for phylogeographic applications.

Parameters
descstring

free-text description of the location

pointslist of Point objects

coordinates (similar to the ‘Point’ element in Google’s KML format)

polygonslist of Polygon objects

coordinate sets defining geographic regions

__init__(desc=None, points=None, polygons=None)

Initialize values of Distribution object.

class Bio.Phylo.PhyloXML.DomainArchitecture(length=None, domains=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Domain architecture of a protein.

Parameters
lengthint

total length of the protein sequence

domainslist ProteinDomain objects

the domains within this protein

__init__(length=None, domains=None)

Initialize values of the DomainArchitecture object.

class Bio.Phylo.PhyloXML.Events(type=None, duplications=None, speciations=None, losses=None, confidence=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Events at the root node of a clade (e.g. one gene duplication).

All attributes are set to None by default, but this object can also be treated as a dictionary, in which case None values are treated as missing keys and deleting a key resets that attribute’s value back to None.

ok_type = {'fusion', 'mixed', 'other', 'speciation_or_duplication', 'transfer', 'unassigned'}
__init__(type=None, duplications=None, speciations=None, losses=None, confidence=None)

Initialize values of the Events object.

items()

Return Event’s items.

keys()

Return Event’s keys.

values()

Return values from a key-value pair in an Events dict.

__len__()

Return number of Events.

__getitem__(key)

Get value of Event with the given key.

__setitem__(key, val)

Add item to Event dict.

__delitem__(key)

Delete Event with given key.

__iter__()

Iterate over the keys present in a Events dict.

__contains__(key)

Return True if Event dict contains key.

class Bio.Phylo.PhyloXML.Id(value, provider=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

A general-purpose identifier element.

Allows to indicate the provider (or authority) of an identifier, e.g. NCBI, along with the value itself.

__init__(value, provider=None)

Initialize values for the identifier object.

__str__()

Return identifier as a string.

class Bio.Phylo.PhyloXML.MolSeq(value, is_aligned=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Store a molecular sequence.

Parameters
valuestring

the sequence itself

is_alignedbool

True if this sequence is aligned with the others (usually meaning all aligned seqs are the same length and gaps may be present)

re_value = re.compile('[a-zA-Z\\.\\-\\?\\*_]+')
__init__(value, is_aligned=None)

Initialize parameters for the MolSeq object.

__str__()

Return the value of the Molecular Sequence object.

class Bio.Phylo.PhyloXML.Point(geodetic_datum, lat, long, alt=None, alt_unit=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Geographic coordinates of a point, with an optional altitude.

Used by element ‘Distribution’.

Parameters
geodetic_datumstring, required

the geodetic datum (also called ‘map datum’). For example, Google’s KML uses ‘WGS84’.

latnumeric

latitude

longnumeric

longitude

altnumeric

altitude

alt_unitstring

unit for the altitude (e.g. ‘meter’)

__init__(geodetic_datum, lat, long, alt=None, alt_unit=None)

Initialize value for the Point object.

class Bio.Phylo.PhyloXML.Polygon(points=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

A polygon defined by a list of ‘Points’ (used by element ‘Distribution’).

Parameters

points – list of 3 or more points representing vertices.

__init__(points=None)

Initialize value for the Polygon object.

__str__()

Return list of points as a string.

class Bio.Phylo.PhyloXML.Property(value, ref, applies_to, datatype, unit=None, id_ref=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

A typed and referenced property from an external resources.

Can be attached to Phylogeny, Clade, and Annotation objects.

Parameters
valuestring

the value of the property

refstring

reference to an external resource, e.g. “NOAA:depth”

applies_tostring

indicates the item to which a property applies to (e.g. ‘node’ for the parent node of a clade, ‘parent_branch’ for the parent branch of a clade, or just ‘clade’).

datatypestring

the type of a property; limited to xsd-datatypes (e.g. ‘xsd:string’, ‘xsd:boolean’, ‘xsd:integer’, ‘xsd:decimal’, ‘xsd:float’, ‘xsd:double’, ‘xsd:date’, ‘xsd:anyURI’).

unitstring (optional)

the unit of the property, e.g. “METRIC:m”

id_refId (optional)

allows to attached a property specifically to one element (on the xml-level)

re_ref = re.compile('[a-zA-Z0-9_]+:[a-zA-Z0-9_\\.\\-\\s]+')
ok_applies_to = {'annotation', 'clade', 'node', 'other', 'parent_branch', 'phylogeny'}
ok_datatype = {'xsd:anyURI', 'xsd:base64Binary', 'xsd:boolean', 'xsd:byte', 'xsd:date', 'xsd:dateTime', 'xsd:decimal', 'xsd:double', 'xsd:duration', 'xsd:float', 'xsd:gDay', 'xsd:gMonth', 'xsd:gMonthDay', 'xsd:gYear', 'xsd:gYearMonth', 'xsd:hexBinary', 'xsd:int', 'xsd:integer', 'xsd:long', 'xsd:negativeInteger', 'xsd:nonNegativeInteger', 'xsd:nonPositiveInteger', 'xsd:normalizedString', 'xsd:positiveInteger', 'xsd:short', 'xsd:string', 'xsd:time', 'xsd:token', 'xsd:unsignedByte', 'xsd:unsignedInt', 'xsd:unsignedLong', 'xsd:unsignedShort'}
__init__(value, ref, applies_to, datatype, unit=None, id_ref=None)

Initialize value for the Property object.

class Bio.Phylo.PhyloXML.ProteinDomain(value, start, end, confidence=None, id=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Represents an individual domain in a domain architecture.

The locations use 0-based indexing, as most Python objects including SeqFeature do, rather than the usual biological convention starting at 1. This means the start and end attributes can be used directly as slice indexes on Seq objects.

Parameters
startnon-negative integer

start of the domain on the sequence, using 0-based indexing

endnon-negative integer

end of the domain on the sequence

confidencefloat

can be used to store e.g. E-values

idstring

unique identifier/name

__init__(value, start, end, confidence=None, id=None)

Initialize value for a ProteinDomain object.

classmethod from_seqfeature(feat)

Create ProteinDomain object from SeqFeature.

to_seqfeature()

Create a SeqFeature from the ProteinDomain Object.

class Bio.Phylo.PhyloXML.Reference(doi=None, desc=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Literature reference for a clade.

NB: Whenever possible, use the doi attribute instead of the free-text desc element.

re_doi = re.compile('[a-zA-Z0-9_\\.]+/[a-zA-Z0-9_\\.]+')
__init__(doi=None, desc=None)

Initialize elements of the Reference class object.

class Bio.Phylo.PhyloXML.Sequence(type=None, id_ref=None, id_source=None, symbol=None, accession=None, name=None, location=None, mol_seq=None, uri=None, domain_architecture=None, annotations=None, other=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

A molecular sequence (Protein, DNA, RNA) associated with a node.

One intended use for id_ref is to link a sequence to a taxonomy (via the taxonomy’s id_source) in case of multiple sequences and taxonomies per node.

Parameters
type{‘dna’, ‘rna’, ‘protein’}

type of molecule this sequence represents

id_refstring

reference to another resource

id_sourcestring

source for the reference

symbolstring

short symbol of the sequence, e.g. ‘ACTM’ (max. 10 chars)

accessionAccession

accession code for this sequence.

namestring

full name of the sequence, e.g. ‘muscle Actin’

location

location of a sequence on a genome/chromosome.

mol_seqMolSeq

the molecular sequence itself

uriUri

link

annotationslist of Annotation objects

annotations on this sequence

domain_architectureDomainArchitecture

protein domains on this sequence

otherlist of Other objects

non-phyloXML elements

types = {'dna', 'protein', 'rna'}
re_symbol = re.compile('\\S{1,10}')
__init__(type=None, id_ref=None, id_source=None, symbol=None, accession=None, name=None, location=None, mol_seq=None, uri=None, domain_architecture=None, annotations=None, other=None)

Initialize value for a Sequence object.

classmethod from_seqrecord(record, is_aligned=None)

Create a new PhyloXML Sequence from a SeqRecord object.

to_seqrecord()

Create a SeqRecord object from this Sequence instance.

The seqrecord.annotations dictionary is packed like so:

{ # Sequence attributes with no SeqRecord equivalent:
  'id_ref': self.id_ref,
  'id_source': self.id_source,
  'location': self.location,
  'uri': { 'value': self.uri.value,
                  'desc': self.uri.desc,
                  'type': self.uri.type },
  # Sequence.annotations attribute (list of Annotations)
  'annotations': [{'ref': ann.ref,
                   'source': ann.source,
                   'evidence': ann.evidence,
                   'type': ann.type,
                   'confidence': [ann.confidence.value,
                                  ann.confidence.type],
                   'properties': [{'value': prop.value,
                                    'ref': prop.ref,
                                    'applies_to': prop.applies_to,
                                    'datatype': prop.datatype,
                                    'unit': prop.unit,
                                    'id_ref': prop.id_ref}
                                   for prop in ann.properties],
                  } for ann in self.annotations],
}
class Bio.Phylo.PhyloXML.SequenceRelation(type, id_ref_0, id_ref_1, distance=None, confidence=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Express a typed relationship between two sequences.

For example, this could be used to describe an orthology (in which case attribute ‘type’ is ‘orthology’).

Parameters
id_ref_0Id

first sequence reference identifier

id_ref_1Id

second sequence reference identifier

distancefloat

distance between the two sequences

typerestricted string

describe the type of relationship

confidenceConfidence

confidence value for this relation

ok_type = {'one_to_one_orthology', 'orthology', 'other', 'paralogy', 'super_orthology', 'ultra_paralogy', 'unknown', 'xenology'}
__init__(type, id_ref_0, id_ref_1, distance=None, confidence=None)

Initialize the class.

class Bio.Phylo.PhyloXML.Taxonomy(id_source=None, id=None, code=None, scientific_name=None, authority=None, rank=None, uri=None, common_names=None, synonyms=None, other=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

Describe taxonomic information for a clade.

Parameters
id_sourceId

link other elements to a taxonomy (on the XML level)

idId

unique identifier of a taxon, e.g. Id(‘6500’, provider=’ncbi_taxonomy’) for the California sea hare

coderestricted string

store UniProt/Swiss-Prot style organism codes, e.g. ‘APLCA’ for the California sea hare ‘Aplysia californica’

scientific_namestring

the standard scientific name for this organism, e.g. ‘Aplysia californica’ for the California sea hare

authoritystring

keep the authority, such as ‘J. G. Cooper, 1863’, associated with the ‘scientific_name’

common_nameslist of strings

common names for this organism

synonymslist of strings

synonyms for this taxon?

rankrestricted string

taxonomic rank

uriUri

link

otherlist of Other objects

non-phyloXML elements

re_code = re.compile('[a-zA-Z0-9_]{2,10}')
ok_rank = {'branch', 'class', 'cohort', 'cultivar', 'division', 'domain', 'family', 'form', 'genus', 'infraclass', 'infracohort', 'infradivision', 'infrakingdom', 'infralegion', 'infraphylum', 'infratribe', 'kingdom', 'legion', 'microphylum', 'order', 'other', 'phylum', 'species', 'subclass', 'subcohort', 'subdivision', 'subfamily', 'subform', 'subgenus', 'subkingdom', 'sublegion', 'suborder', 'subphylum', 'subspecies', 'subtribe', 'subvariety', 'superclass', 'supercohort', 'superdivision', 'superfamily', 'superlegion', 'superorder', 'superphylum', 'superspecies', 'supertribe', 'tribe', 'unknown', 'variety'}
__init__(id_source=None, id=None, code=None, scientific_name=None, authority=None, rank=None, uri=None, common_names=None, synonyms=None, other=None)

Initialize the class.

__str__()

Show the class name and an identifying attribute.

class Bio.Phylo.PhyloXML.Uri(value, desc=None, type=None)

Bases: Bio.Phylo.PhyloXML.PhyloElement

A uniform resource identifier.

In general, this is expected to be an URL (for example, to link to an image on a website, in which case the type attribute might be ‘image’ and desc might be ‘image of a California sea hare’).

__init__(value, desc=None, type=None)

Initialize the class.

__str__()

Return string representation of Uri.