Bio.Phylo.PhyloXML module
Classes corresponding to phyloXML elements.
See Also
- Official specification:
- Journal article:
Han and Zmasek (2009), https://doi.org/10.1186/1471-2105-10-356
- exception Bio.Phylo.PhyloXML.PhyloXMLWarning
Bases:
Bio.BiopythonWarning
Warning for non-compliance with the phyloXML specification.
- class Bio.Phylo.PhyloXML.PhyloElement
Bases:
Bio.Phylo.BaseTree.TreeElement
Base class for all PhyloXML objects.
- class Bio.Phylo.PhyloXML.Phyloxml(attributes, phylogenies=None, other=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Root node of the PhyloXML document.
Contains an arbitrary number of Phylogeny elements, possibly followed by elements from other namespaces.
- Parameters
- attributesdict
(XML namespace definitions)
- phylogenieslist
The phylogenetic trees
- otherlist
Arbitrary non-phyloXML elements, if any
- __init__(attributes, phylogenies=None, other=None)
Initialize parameters for PhyloXML object.
- __getitem__(index)
Get a phylogeny by index or name.
- __iter__()
Iterate through the phylogenetic trees in this object.
- __len__()
Return the number of phylogenetic trees in this object.
- __str__()
Return name of phylogenies in the object.
- class Bio.Phylo.PhyloXML.Other(tag, namespace=None, attributes=None, value=None, children=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Container for non-phyloXML elements in the tree.
Usually, an Other object will have either a ‘value’ or a non-empty list of ‘children’, but not both. This is not enforced here, though.
- Parameters
- tagstring
local tag for the XML node
- namespacestring
XML namespace for the node – should not be the default phyloXML namespace.
- attributesdict of strings
attributes on the XML node
- valuestring
text contained directly within this XML node
- childrenlist
child nodes, if any (also
Other
instances)
- __init__(tag, namespace=None, attributes=None, value=None, children=None)
Initialize values for non-phyloXML elements.
- __iter__()
Iterate through the children of this object (if any).
- class Bio.Phylo.PhyloXML.Phylogeny(root=None, rooted=True, rerootable=None, branch_length_unit=None, type=None, name=None, id=None, description=None, date=None, confidences=None, clade_relations=None, sequence_relations=None, properties=None, other=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
,Bio.Phylo.BaseTree.Tree
A phylogenetic tree.
- Parameters
- rootClade
the root node/clade of this tree
- rootedbool
True if this tree is rooted
- rerootablebool
True if this tree is rerootable
- branch_length_unitstring
unit for branch_length values on clades
- namestring
identifier for this tree, not required to be unique
- idId
unique identifier for this tree
- descriptionstring
plain-text description
- dateDate
date for the root node of this tree
- confidenceslist
Confidence objects for this tree
- clade_relationslist
CladeRelation objects
- sequence_relationslist
SequenceRelation objects
- propertieslist
Property objects
- otherlist
non-phyloXML elements (type
Other
)
- __init__(root=None, rooted=True, rerootable=None, branch_length_unit=None, type=None, name=None, id=None, description=None, date=None, confidences=None, clade_relations=None, sequence_relations=None, properties=None, other=None)
Initialize values for phylogenetic tree object.
- classmethod from_tree(tree, **kwargs)
Create a new Phylogeny given a Tree (from Newick/Nexus or BaseTree).
Keyword arguments are the usual
Phylogeny
constructor parameters.
- classmethod from_clade(clade, **kwargs)
Create a new Phylogeny given a Newick or BaseTree Clade object.
Keyword arguments are the usual
PhyloXML.Clade
constructor parameters.
- as_phyloxml()
Return this tree, a PhyloXML-compatible Phylogeny object.
Overrides the
BaseTree
method.
- to_phyloxml_container(**kwargs)
Create a new Phyloxml object containing just this phylogeny.
- to_alignment()
Construct a MultipleSeqAlignment from the aligned sequences in this tree.
- property alignment
Construct an Alignment object from the aligned sequences in this tree.
- property confidence
Equivalent to self.confidences[0] if there is only 1 value (PRIVATE).
See Also:
Clade.confidence
,Clade.taxonomy
- class Bio.Phylo.PhyloXML.Clade(branch_length=None, id_source=None, name=None, width=None, color=None, node_id=None, events=None, binary_characters=None, date=None, confidences=None, taxonomies=None, sequences=None, distributions=None, references=None, properties=None, clades=None, other=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
,Bio.Phylo.BaseTree.Clade
Describes a branch of the current phylogenetic tree.
Used recursively, describes the topology of a phylogenetic tree.
Both
color
andwidth
elements should be interpreted by client code as applying to the whole clade, including all descendents, unless overwritten in-sub clades. This module doesn’t automatically assign these attributes to sub-clades to achieve this cascade – and neither should you.- Parameters
- branch_length
parent branch length of this clade
- id_source
link other elements to a clade (on the xml-level)
- namestring
short label for this clade
- confidenceslist of Confidence objects
used to indicate the support for a clade/parent branch.
- widthfloat
branch width for this clade (including branch from parent)
- colorBranchColor
color used for graphical display of this clade
- node_id
unique identifier for the root node of this clade
- taxonomieslist
Taxonomy objects
- sequenceslist
Sequence objects
- eventsEvents
describe such events as gene-duplications at the root node/parent branch of this clade
- binary_charactersBinaryCharacters
binary characters
- distributionslist of Distribution objects
distribution(s) of this clade
- dateDate
a date for the root node of this clade
- referenceslist
Reference objects
- propertieslist
Property objects
- cladeslist Clade objects
Sub-clades
- otherlist of Other objects
non-phyloXML objects
- __init__(branch_length=None, id_source=None, name=None, width=None, color=None, node_id=None, events=None, binary_characters=None, date=None, confidences=None, taxonomies=None, sequences=None, distributions=None, references=None, properties=None, clades=None, other=None)
Initialize value for the Clade object.
- classmethod from_clade(clade, **kwargs)
Create a new PhyloXML Clade from a Newick or BaseTree Clade object.
Keyword arguments are the usual PhyloXML Clade constructor parameters.
- to_phylogeny(**kwargs)
Create a new phylogeny containing just this clade.
- property confidence
Return confidence values (PRIVATE).
- property taxonomy
Get taxonomy list for the clade (PRIVATE).
- class Bio.Phylo.PhyloXML.BranchColor(*args, **kwargs)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
,Bio.Phylo.BaseTree.BranchColor
Manage Tree branch’s color.
- __init__(*args, **kwargs)
Initialize parameters for the BranchColor object.
- class Bio.Phylo.PhyloXML.Accession(value, source)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Captures the local part in a sequence identifier.
Example: In
UniProtKB:P17304
, the Accession instance attributevalue
is ‘P17304’ and thesource
attribute is ‘UniProtKB’.- __init__(value, source)
Initialize value for Accession object.
- __str__()
Show the class name and an identifying attribute.
- class Bio.Phylo.PhyloXML.Annotation(ref=None, source=None, evidence=None, type=None, desc=None, confidence=None, uri=None, properties=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
The annotation of a molecular sequence.
It is recommended to annotate by using the optional ‘ref’ attribute.
- Parameters
- refstring
reference string, e.g. ‘GO:0008270’, ‘KEGG:Tetrachloroethene degradation’, ‘EC:1.1.1.1’
- sourcestring
plain-text source for this annotation
- evidencestr
describe evidence as free text (e.g. ‘experimental’)
- descstring
free text description
- confidenceConfidence
state the type and value of support (type Confidence)
- propertieslist
typed and referenced annotations from external resources
- uriUri
link
- re_ref = re.compile('[a-zA-Z0-9_]+:[a-zA-Z0-9_\\.\\-\\s]+')
- __init__(ref=None, source=None, evidence=None, type=None, desc=None, confidence=None, uri=None, properties=None)
Initialize value for the Annotation object.
- class Bio.Phylo.PhyloXML.BinaryCharacters(type=None, gained_count=None, lost_count=None, present_count=None, absent_count=None, gained=None, lost=None, present=None, absent=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Binary characters at the root of a clade.
The names and/or counts of binary characters present, gained, and lost at the root of a clade.
- __init__(type=None, gained_count=None, lost_count=None, present_count=None, absent_count=None, gained=None, lost=None, present=None, absent=None)
Initialize values for the BinaryCharacters object.
- class Bio.Phylo.PhyloXML.CladeRelation(type, id_ref_0, id_ref_1, distance=None, confidence=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Expresses a typed relationship between two clades.
For example, this could be used to describe multiple parents of a clade.
- __init__(type, id_ref_0, id_ref_1, distance=None, confidence=None)
Initialize values for the CladeRelation object.
- class Bio.Phylo.PhyloXML.Confidence(value, type='unknown')
Bases:
float
,Bio.Phylo.PhyloXML.PhyloElement
A general purpose confidence element.
For example, this can be used to express the bootstrap support value of a clade (in which case the
type
attribute is ‘bootstrap’).- Parameters
- valuefloat
confidence value
- typestring
label for the type of confidence, e.g. ‘bootstrap’
- static __new__(cls, value, type='unknown')
Create and return a Confidence object with the specified value and type.
- property value
Return the float value of the Confidence object.
- class Bio.Phylo.PhyloXML.Date(value=None, unit=None, desc=None, minimum=None, maximum=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
A date associated with a clade/node.
Its value can be numerical by using the ‘value’ element and/or free text with the ‘desc’ element’ (e.g. ‘Silurian’). If a numerical value is used, it is recommended to employ the ‘unit’ attribute.
- Parameters
- unitstring
type of numerical value (e.g. ‘mya’ for ‘million years ago’)
- valuefloat
the date value
- descstring
plain-text description of the date
- minimumfloat
lower bound on the date value
- maximumfloat
upper bound on the date value
- __init__(value=None, unit=None, desc=None, minimum=None, maximum=None)
Initialize values of the Date object.
- __str__()
Show the class name and the human-readable date.
- class Bio.Phylo.PhyloXML.Distribution(desc=None, points=None, polygons=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Geographic distribution of the items of a clade (species, sequences).
Intended for phylogeographic applications.
- Parameters
- descstring
free-text description of the location
- pointslist of
Point
objects coordinates (similar to the ‘Point’ element in Google’s KML format)
- polygonslist of
Polygon
objects coordinate sets defining geographic regions
- __init__(desc=None, points=None, polygons=None)
Initialize values of Distribution object.
- class Bio.Phylo.PhyloXML.DomainArchitecture(length=None, domains=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Domain architecture of a protein.
- Parameters
- lengthint
total length of the protein sequence
- domainslist ProteinDomain objects
the domains within this protein
- __init__(length=None, domains=None)
Initialize values of the DomainArchitecture object.
- class Bio.Phylo.PhyloXML.Events(type=None, duplications=None, speciations=None, losses=None, confidence=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Events at the root node of a clade (e.g. one gene duplication).
All attributes are set to None by default, but this object can also be treated as a dictionary, in which case None values are treated as missing keys and deleting a key resets that attribute’s value back to None.
- ok_type = {'fusion', 'mixed', 'other', 'speciation_or_duplication', 'transfer', 'unassigned'}
- __init__(type=None, duplications=None, speciations=None, losses=None, confidence=None)
Initialize values of the Events object.
- items()
Return Event’s items.
- keys()
Return Event’s keys.
- values()
Return values from a key-value pair in an Events dict.
- __len__()
Return number of Events.
- __getitem__(key)
Get value of Event with the given key.
- __setitem__(key, val)
Add item to Event dict.
- __delitem__(key)
Delete Event with given key.
- __iter__()
Iterate over the keys present in a Events dict.
- __contains__(key)
Return True if Event dict contains key.
- class Bio.Phylo.PhyloXML.Id(value, provider=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
A general-purpose identifier element.
Allows to indicate the provider (or authority) of an identifier, e.g. NCBI, along with the value itself.
- __init__(value, provider=None)
Initialize values for the identifier object.
- __str__()
Return identifier as a string.
- class Bio.Phylo.PhyloXML.MolSeq(value, is_aligned=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Store a molecular sequence.
- Parameters
- valuestring
the sequence itself
- is_alignedbool
True if this sequence is aligned with the others (usually meaning all aligned seqs are the same length and gaps may be present)
- re_value = re.compile('[a-zA-Z\\.\\-\\?\\*_]+')
- __init__(value, is_aligned=None)
Initialize parameters for the MolSeq object.
- __str__()
Return the value of the Molecular Sequence object.
- class Bio.Phylo.PhyloXML.Point(geodetic_datum, lat, long, alt=None, alt_unit=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Geographic coordinates of a point, with an optional altitude.
Used by element ‘Distribution’.
- Parameters
- geodetic_datumstring, required
the geodetic datum (also called ‘map datum’). For example, Google’s KML uses ‘WGS84’.
- latnumeric
latitude
- longnumeric
longitude
- altnumeric
altitude
- alt_unitstring
unit for the altitude (e.g. ‘meter’)
- __init__(geodetic_datum, lat, long, alt=None, alt_unit=None)
Initialize value for the Point object.
- class Bio.Phylo.PhyloXML.Polygon(points=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
A polygon defined by a list of ‘Points’ (used by element ‘Distribution’).
- Parameters
points – list of 3 or more points representing vertices.
- __init__(points=None)
Initialize value for the Polygon object.
- __str__()
Return list of points as a string.
- class Bio.Phylo.PhyloXML.Property(value, ref, applies_to, datatype, unit=None, id_ref=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
A typed and referenced property from an external resources.
Can be attached to
Phylogeny
,Clade
, andAnnotation
objects.- Parameters
- valuestring
the value of the property
- refstring
reference to an external resource, e.g. “NOAA:depth”
- applies_tostring
indicates the item to which a property applies to (e.g. ‘node’ for the parent node of a clade, ‘parent_branch’ for the parent branch of a clade, or just ‘clade’).
- datatypestring
the type of a property; limited to xsd-datatypes (e.g. ‘xsd:string’, ‘xsd:boolean’, ‘xsd:integer’, ‘xsd:decimal’, ‘xsd:float’, ‘xsd:double’, ‘xsd:date’, ‘xsd:anyURI’).
- unitstring (optional)
the unit of the property, e.g. “METRIC:m”
- id_refId (optional)
allows to attached a property specifically to one element (on the xml-level)
- re_ref = re.compile('[a-zA-Z0-9_]+:[a-zA-Z0-9_\\.\\-\\s]+')
- ok_applies_to = {'annotation', 'clade', 'node', 'other', 'parent_branch', 'phylogeny'}
- ok_datatype = {'xsd:anyURI', 'xsd:base64Binary', 'xsd:boolean', 'xsd:byte', 'xsd:date', 'xsd:dateTime', 'xsd:decimal', 'xsd:double', 'xsd:duration', 'xsd:float', 'xsd:gDay', 'xsd:gMonth', 'xsd:gMonthDay', 'xsd:gYear', 'xsd:gYearMonth', 'xsd:hexBinary', 'xsd:int', 'xsd:integer', 'xsd:long', 'xsd:negativeInteger', 'xsd:nonNegativeInteger', 'xsd:nonPositiveInteger', 'xsd:normalizedString', 'xsd:positiveInteger', 'xsd:short', 'xsd:string', 'xsd:time', 'xsd:token', 'xsd:unsignedByte', 'xsd:unsignedInt', 'xsd:unsignedLong', 'xsd:unsignedShort'}
- __init__(value, ref, applies_to, datatype, unit=None, id_ref=None)
Initialize value for the Property object.
- class Bio.Phylo.PhyloXML.ProteinDomain(value, start, end, confidence=None, id=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Represents an individual domain in a domain architecture.
The locations use 0-based indexing, as most Python objects including SeqFeature do, rather than the usual biological convention starting at 1. This means the start and end attributes can be used directly as slice indexes on Seq objects.
- Parameters
- startnon-negative integer
start of the domain on the sequence, using 0-based indexing
- endnon-negative integer
end of the domain on the sequence
- confidencefloat
can be used to store e.g. E-values
- idstring
unique identifier/name
- __init__(value, start, end, confidence=None, id=None)
Initialize value for a ProteinDomain object.
- classmethod from_seqfeature(feat)
Create ProteinDomain object from SeqFeature.
- to_seqfeature()
Create a SeqFeature from the ProteinDomain Object.
- class Bio.Phylo.PhyloXML.Reference(doi=None, desc=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Literature reference for a clade.
NB: Whenever possible, use the
doi
attribute instead of the free-textdesc
element.- re_doi = re.compile('[a-zA-Z0-9_\\.]+/[a-zA-Z0-9_\\.]+')
- __init__(doi=None, desc=None)
Initialize elements of the Reference class object.
- class Bio.Phylo.PhyloXML.Sequence(type=None, id_ref=None, id_source=None, symbol=None, accession=None, name=None, location=None, mol_seq=None, uri=None, domain_architecture=None, annotations=None, other=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
A molecular sequence (Protein, DNA, RNA) associated with a node.
One intended use for
id_ref
is to link a sequence to a taxonomy (via the taxonomy’sid_source
) in case of multiple sequences and taxonomies per node.- Parameters
- type{‘dna’, ‘rna’, ‘protein’}
type of molecule this sequence represents
- id_refstring
reference to another resource
- id_sourcestring
source for the reference
- symbolstring
short symbol of the sequence, e.g. ‘ACTM’ (max. 10 chars)
- accessionAccession
accession code for this sequence.
- namestring
full name of the sequence, e.g. ‘muscle Actin’
- location
location of a sequence on a genome/chromosome.
- mol_seqMolSeq
the molecular sequence itself
- uriUri
link
- annotationslist of Annotation objects
annotations on this sequence
- domain_architectureDomainArchitecture
protein domains on this sequence
- otherlist of Other objects
non-phyloXML elements
- types = {'dna', 'protein', 'rna'}
- re_symbol = re.compile('\\S{1,10}')
- __init__(type=None, id_ref=None, id_source=None, symbol=None, accession=None, name=None, location=None, mol_seq=None, uri=None, domain_architecture=None, annotations=None, other=None)
Initialize value for a Sequence object.
- classmethod from_seqrecord(record, is_aligned=None)
Create a new PhyloXML Sequence from a SeqRecord object.
- to_seqrecord()
Create a SeqRecord object from this Sequence instance.
The seqrecord.annotations dictionary is packed like so:
{ # Sequence attributes with no SeqRecord equivalent: 'id_ref': self.id_ref, 'id_source': self.id_source, 'location': self.location, 'uri': { 'value': self.uri.value, 'desc': self.uri.desc, 'type': self.uri.type }, # Sequence.annotations attribute (list of Annotations) 'annotations': [{'ref': ann.ref, 'source': ann.source, 'evidence': ann.evidence, 'type': ann.type, 'confidence': [ann.confidence.value, ann.confidence.type], 'properties': [{'value': prop.value, 'ref': prop.ref, 'applies_to': prop.applies_to, 'datatype': prop.datatype, 'unit': prop.unit, 'id_ref': prop.id_ref} for prop in ann.properties], } for ann in self.annotations], }
- class Bio.Phylo.PhyloXML.SequenceRelation(type, id_ref_0, id_ref_1, distance=None, confidence=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Express a typed relationship between two sequences.
For example, this could be used to describe an orthology (in which case attribute ‘type’ is ‘orthology’).
- Parameters
- id_ref_0Id
first sequence reference identifier
- id_ref_1Id
second sequence reference identifier
- distancefloat
distance between the two sequences
- typerestricted string
describe the type of relationship
- confidenceConfidence
confidence value for this relation
- ok_type = {'one_to_one_orthology', 'orthology', 'other', 'paralogy', 'super_orthology', 'ultra_paralogy', 'unknown', 'xenology'}
- __init__(type, id_ref_0, id_ref_1, distance=None, confidence=None)
Initialize the class.
- class Bio.Phylo.PhyloXML.Taxonomy(id_source=None, id=None, code=None, scientific_name=None, authority=None, rank=None, uri=None, common_names=None, synonyms=None, other=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
Describe taxonomic information for a clade.
- Parameters
- id_sourceId
link other elements to a taxonomy (on the XML level)
- idId
unique identifier of a taxon, e.g. Id(‘6500’, provider=’ncbi_taxonomy’) for the California sea hare
- coderestricted string
store UniProt/Swiss-Prot style organism codes, e.g. ‘APLCA’ for the California sea hare ‘Aplysia californica’
- scientific_namestring
the standard scientific name for this organism, e.g. ‘Aplysia californica’ for the California sea hare
- authoritystring
keep the authority, such as ‘J. G. Cooper, 1863’, associated with the ‘scientific_name’
- common_nameslist of strings
common names for this organism
- synonymslist of strings
synonyms for this taxon?
- rankrestricted string
taxonomic rank
- uriUri
link
- otherlist of Other objects
non-phyloXML elements
- re_code = re.compile('[a-zA-Z0-9_]{2,10}')
- ok_rank = {'branch', 'class', 'cohort', 'cultivar', 'division', 'domain', 'family', 'form', 'genus', 'infraclass', 'infracohort', 'infradivision', 'infrakingdom', 'infralegion', 'infraphylum', 'infratribe', 'kingdom', 'legion', 'microphylum', 'order', 'other', 'phylum', 'species', 'subclass', 'subcohort', 'subdivision', 'subfamily', 'subform', 'subgenus', 'subkingdom', 'sublegion', 'suborder', 'subphylum', 'subspecies', 'subtribe', 'subvariety', 'superclass', 'supercohort', 'superdivision', 'superfamily', 'superlegion', 'superorder', 'superphylum', 'superspecies', 'supertribe', 'tribe', 'unknown', 'variety'}
- __init__(id_source=None, id=None, code=None, scientific_name=None, authority=None, rank=None, uri=None, common_names=None, synonyms=None, other=None)
Initialize the class.
- __str__()
Show the class name and an identifying attribute.
- class Bio.Phylo.PhyloXML.Uri(value, desc=None, type=None)
Bases:
Bio.Phylo.PhyloXML.PhyloElement
A uniform resource identifier.
In general, this is expected to be an URL (for example, to link to an image on a website, in which case the
type
attribute might be ‘image’ anddesc
might be ‘image of a California sea hare’).- __init__(value, desc=None, type=None)
Initialize the class.
- __str__()
Return string representation of Uri.