[Biopython-dev] Phylogenetic rats nests( oops I meant trees )
thomas at cbs.dtu.dk
Fri Oct 26 03:10:35 EDT 2001
"Cayte" <katel at worldpath.net> writes:
> Many of the bioformats represent phylogenetic data as well as sequence or path
> data. I can think of a number of problems with
> phylogenetic data.
I do not really understand your questions. Are you concerned about how to
store and convert sequence formats containing alignments, or are planning a
huge phylogeny database project or are you trying to answer philosophical
aspects of molecular evolution :-) ?
The sequence is the base object. An alignment represents
- one among many - solution to linearize sequences. A phylogenetic tree is
way of clustering sequences considering evolutionary changes. The
reconstruction of a phylogenetic tree is most of the time based on a
sequence alignment and dependents on how you interpret the data and which
method you use (Distance [UPGMA and, Neighbor Joining], Maximum Parsimony
and Maximum Likelihood)
> 1. A number of types of relationships are possible. A sequence may be a
> descendent or an ancestor of another sequence.
> They both may have a common ancestor. They may have converged to the same
> patteern. They may have hopped across species. Whatever the arguments against
> transgenic species, the assertion that it never happens in nature ain'tr so! The
> latest issue of Natural History describes how the vertebrate immune system may have
> once been a parasite.
Lateral (horizontal) gene transfer [LGT] is very common, the biggest known events
are the origin of eukaryotic mitochondria from alpha-proteo bacteria and
the origin of plant chloroplast from cyanobacteria. There is a still
ongoing flow of genetic material between Thermotoga (eubacteria) and
Pyrococcus (archaea), which makes it hard to tell the original "owner" of a
But: LGT does mainly affect our _interpretation_ of sequence data.
> 2. Researchers often don't agree among themselves what these relationships are.
> Should the links contain epistemology links that describe the level of confidence
> and the methodology plus journal references?
Who is going to decide the level of confidence ?
a referee ?
the bootstrap values ?
There is no way to prove a phylogenetic relationship with sequences only.
> 3. Links will change as new research unfolds. This will be a maintenance issue(
> luckily not ours ). But we need the ability to easily change links and remove dead
> links. Should a mechanism for the storage historical information be provided?
New phylogenetic trees with new information and biological interpretation
will emerge and be published ... which will result in new sequence entries.
> 4. What if an intermediate is found between an ancestor descendent pair?
> Should we delete the old link? Then the annotation will be lost. Should
> the old link contain pointers to the new links?
What exactly are "links" ? Is this a synonym for nodes in the tree or
hyperlinks (XRef's) in e.g. EMBL annotations ?
> 5. Should we limit our scope to just seuences?
What is the original problem description ? If you are planning normal
sequence/alignment/tree format storage, then you should not include
additional interpretations and views which are not found in the original
experiment (experiment =alignment, tree reconstruction, evolutionary interpretation)
On the other hand, if you are thinking about an internal phylogeny database
which gets dynamically updated during e.g. the coordination of sequencing
projects, then trees should be reconstructed after each change and
contradicting node annotations should get logged inside the database.
Could you please mail me your intended scope ?
Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology
thomas at biopython.org The Technical University of Denmark
CBS: +45 45 252489 Building 208, DK-2800 Lyngby
Fax +45 45 931585 http://www.cbs.dtu.dk/thomas
De Chelonian Mobile ... The Turtle Moves ...
More information about the Biopython-dev