
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://biopython.org/w/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://biopython.org/w/api.php?action=feedcontributions&amp;user=Matzke&amp;feedformat=atom</id>
		<title>Biopython - User contributions [en]</title>
		<link rel="self" type="application/atom+xml" href="http://biopython.org/w/api.php?action=feedcontributions&amp;user=Matzke&amp;feedformat=atom"/>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/Special:Contributions/Matzke"/>
		<updated>2013-05-23T10:46:14Z</updated>
		<subtitle>User contributions</subtitle>
		<generator>MediaWiki 1.18.1</generator>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-08-19T21:58:19Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Background: organization of Bio.Geography */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
All classes and functions have been documented with standard docstrings.  Code is available at the most recent github commit here: http://github.com/nmatzke/biopython/commits/Geography&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
Each class/function has been commented with docstrings.  For latest commit, see: [http://github.com/nmatzke/biopython/commits/Geography http://github.com/nmatzke/biopython/commits/Geography].&lt;br /&gt;
&lt;br /&gt;
==Tutorial==&lt;br /&gt;
&lt;br /&gt;
Bio.Geography is a module for gathering and processing biogeographical data.  The major motivation for the module is to assist analyses of evolutionary biogeography.  A variety of inference algorithms are available for such analyses, such as [http://www.ebc.uu.se/systzoo/research/diva/manual/dmanual.html DIVA] and [http://code.google.com/p/lagrange/ lagrange].  The inputs to such programs are typically (a) a phylogeny and (b) the areas inhabited by the species at the tips of the phylogeny.  A researcher who has gathered data on a particular group will likely have direct access to species location data, but many large-scale analyses may require gathering large amounts of occurrence data.  Automated gathering/processing of occurrence data has a variety of other applications as well, including species mapping, niche modeling, error-checking of museum records, and monitoring range changes.&lt;br /&gt;
&lt;br /&gt;
Occurrence data is derived mainly from museum collections.  The major source of such data is the [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF).  GBIF serves occurrence data recorded by hundreds of museums worldwide.  GBIF occurrence data can be [http://data.gbif.org/occurrences/ searched manually], and results downloaded (see examples on GBIF website) in various formats: spreadsheet, Google Earth KML, or the XML DarwinCore format.  &lt;br /&gt;
&lt;br /&gt;
GBIF can also be accessed via an API. Bio.Geography can process manually downloaded DarwinCore results, or access GBIF directly.&lt;br /&gt;
&lt;br /&gt;
===Background: organization of Bio.Geography===&lt;br /&gt;
&lt;br /&gt;
It is useful to understand the overall organization of classes in Bio.Geography.  There are four classes within the GbifXml module:&lt;br /&gt;
&lt;br /&gt;
*'''GbifSearchResults''' -- Contains the methods for conducting a GBIF search, as well as attributes storing the results in different objects, depending on their stage of processing.  &lt;br /&gt;
**Also contains summary statistics on the search (e.g., number of records found).&lt;br /&gt;
**The three objects which store results in different forms are '''GbifDarwincoreXmlString''', '''GbifXmlTree''', and a list of individual '''GbifObservationRecord''' objects.&lt;br /&gt;
**Method print_records for printing all contained records to screen.&lt;br /&gt;
* '''GbifDarwincoreXmlString''' -- Contains the raw text returned by GBIF.  If output to a file, this would be a standard XML file adhering to the DarwinCore standard.  Inherits from the standard python String class.&lt;br /&gt;
* '''GbifXmlTree''' -- Contains the ElementTree object which results from parsing GBIF's XML results.  Also a number of methods for searching the ElementTree and finding matching elements, finding a certain element when it is contained within a certain larger element, etc.&lt;br /&gt;
* '''GbifObservationRecord''' -- Contains the attributes which may be found within a certain record, e.g. taxon, genus, species, latitude, longitude, etc., as well as functions for classifying a record into a certain geographical area, printing a record to screen, etc.&lt;br /&gt;
* '''TreeSum''' -- Contains functions and attributes for summarizing a phylogenetic tree, subsetting it, writing to screen, and calculating summary statistics.&lt;br /&gt;
&lt;br /&gt;
===Parsing a local (manually downloaded) GBIF DarwinCore XML file===&lt;br /&gt;
&lt;br /&gt;
For one-off uses of GBIF, you may find it easiest to just download occurrence data in spreadsheet format (for analysis) or KML (for mapping).  But for analyses of many groups, or for repeatedly updating an analysis as new data is added to GBIF, automation is desirable.&lt;br /&gt;
&lt;br /&gt;
A manual search conducted on the GBIF website can return results in the form of an XML file adhering to the [http://en.wikipedia.org/wiki/Darwin_Core DarwinCore] data standard.  An example file can be found in biopython's Tests/Geography directory, with the name ''utric_search_v2.xml''.  This file contains over 1000 occurrence records for ''Utricularia'', a genus of carnivorous plant.&lt;br /&gt;
&lt;br /&gt;
Save the utric_search_v2.xml file in your working directory (or download a similar file from GBIF).  Here are suggested steps to parse the file with Bio.Geography's GbifXml module.  First, import the necessary classes and functions, and specify the filename of the input file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Bio.Geography.GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GeneralUtils import fix_ASCII_file&lt;br /&gt;
&lt;br /&gt;
xml_fn = 'utric_search_v2.xml'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Second, in order to display results to screen in python, we need to convert the file to plain ASCII (GBIF results contain all many of unusual characters from different languages, and no standardization of slanted quotes and the like; this can cause crashes when attempting to print to screen in python or ipython).  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
xml_fn_new = fix_ASCII_file(xml_fn)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This creates a new file with the string &amp;quot;_fixed.xml&amp;quot; added to the filename.&lt;br /&gt;
&lt;br /&gt;
Next, we will parse the XML file into an ElementTree (a python object which contains the data from the XML file as a nested series of lists and dictionaries).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from xml.etree import ElementTree as ET&lt;br /&gt;
xmltree = ET.parse(xml_fn_new)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then store the element tree as an object of Class GbifXmlTree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gbif_recs_xmltree = GbifXmlTree(xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, with the xmltree stored, we parse it into individual records (stored in individual objects of class GbifObservationRecord), which are then stored as a group in an object of class GbifSearchResults.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs = GbifSearchResults(gbif_recs_xmltree)&lt;br /&gt;
recs.extract_occurrences_from_gbif_xmltree(recs.gbif_recs_xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The list of individual observation records can be accessed at recs.obs_recs_list.  This will display the references to the first five records:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs.obs_recs_list[0:4]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To get the data for the first individual record:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;rec = recs.obs_recs_list[0]&lt;br /&gt;
&lt;br /&gt;
dir(rec)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
rec.lat will return the latitude, rec.long the longitude, etc.  Certain data attributes are not found in all GBIF records; if they are missing, the field in question will contain &amp;quot;None&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
To print all of the records in a tab-delimited table format:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Checking how many matching records are hosted by GBIF===&lt;br /&gt;
&lt;br /&gt;
Before we go through the trouble of downloading thousands of records, we may wish to know how many there are in GBIF first.  The user must set up a dictionary containing the fields and search terms as keys and items, respectively.  I.e.,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;from GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;quot;'format': 'darwin'&amp;quot; specifies that GBIF should return the results in DarwinCore format.&lt;br /&gt;
&lt;br /&gt;
'scientificname' specifies the genus name to search on.  Adding an '*' after the name will return anything that begins with &amp;quot;Genlisea&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
The full list of search terms can be found on GBIF's [http://data.gbif.org/tutorial/services Occurrence record data service], which is linked from the [http://data.gbif.org/tutorial/services Using data from the GBIF portal].&lt;br /&gt;
&lt;br /&gt;
Once you have specified your search parameters, initiate a new GbifSearchResults object and run get_numhits to get the number of hits:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&lt;br /&gt;
recs = GbifSearchResults()&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As of August 2009, 169 matching records existed in GBIF matching &amp;quot;Genlisea*&amp;quot;&lt;br /&gt;
&lt;br /&gt;
For constrast, run the same search ''without'' the asterisk ('*'):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea'}&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We only get ~10 results -- presumably records of specimens only identified down to genus and no further.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading an individual record===&lt;br /&gt;
&lt;br /&gt;
Individual records can be downloaded by key.  To download an individual record:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
rec = recs.obs_recs_list[0]&lt;br /&gt;
key = rec.gbifkey&lt;br /&gt;
# (or manually)&lt;br /&gt;
# key = 175067484&lt;br /&gt;
xmlrec = recs.get_record(key)&lt;br /&gt;
print xmlrec&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to print the xmlrec ElementTree object, store xmlrec in a GbifXmlTree object and run print_xmltree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;GbifXmlTree(xmlrec).print_xmltree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Summary statistics for phylogenetic trees with TreeSum===&lt;br /&gt;
&lt;br /&gt;
Biogeographical regions are often characterized by alpha and beta-diversity statistics: basically, these are indices of the number of species found within or between regions.  Given a phylogeny for organisms in a region, phylogenetic alpha- and beta-diversity statistics can be calculated.  This has been implemented in a thorough way in the [http://www.phylodiversity.net/phylocom/ phylocom package] by Webb et al., but for some purposes it is useful to calculate the statistics directly in python.&lt;br /&gt;
&lt;br /&gt;
Here, we need to start with a Newick tree string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;trstr2 = &amp;quot;(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);&amp;quot;&lt;br /&gt;
&lt;br /&gt;
to2 = Tree(trstr2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, we create a tree summary object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts = TreeSum(to2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The function test_Tree will run the metrics (MPD = Mean Phylogenetic Distance, NRI = Net Relatedness Index, MNPD = Mean Nearest Neighbor Phylogenetic Distance, NTI = Nearest Taxon Index, PD = total Phylogenetic distance) and output to screen:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts.test_Tree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
By subsetting a tree to taxa only existing within a region, statistics can be calculated by region.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading and processing large numbers of records===&lt;br /&gt;
&lt;br /&gt;
GBIF only allows a maximum of 1000 observation records to be downloaded at a time (10,000 for KML records).  To get more, we need to download and process them in stages.&lt;br /&gt;
&lt;br /&gt;
Again we will set up our parameters dictionary, and also an &amp;quot;inc&amp;quot; variable to specify the number of records to download per server request.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&lt;br /&gt;
inc = 100&lt;br /&gt;
recs3 = GbifSearchResults()&lt;br /&gt;
gbif_xmltree_list = recs3.get_all_records_by_increment(params, inc)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with biopython's interactions with NCBI servers, the GbifSearchResults module keeps track of when the last GBIF request was made, and requires a 3-second wait before a new request.&lt;br /&gt;
&lt;br /&gt;
Each server request returns an XML string; these are parsed into GbifXmlTree objects, and a list of the returned GbifXmlTree objects is returned to gbif_xmltree_list.  The individual records have also been parsed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs3.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Classifying records into geographical regions===&lt;br /&gt;
&lt;br /&gt;
Biogeographical analyses will often require that you determine what area(s) a taxon lives in.  Areas are not always obviously delineated, and analysts may wish to try several different possible sets of areas and see how this influences their analysis.&lt;br /&gt;
&lt;br /&gt;
Below, we set up a polygon containing the latitude/longitude coordinates for the Northern Hemisphere, and then set the &amp;quot;area&amp;quot; attribute for each matching record to &amp;quot;NorthernHemisphere&amp;quot;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ul = (-180, 90)&lt;br /&gt;
ur = (180, 90)&lt;br /&gt;
ll = (-180, 0)&lt;br /&gt;
lr = (180, 0)&lt;br /&gt;
poly = [ul, ur, ll, lr]&lt;br /&gt;
polyname = &amp;quot;NorthernHemisphere&amp;quot;&lt;br /&gt;
&lt;br /&gt;
recs3.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This process can be repeated for all polygons of interest until all GBIF records have been classified (except for GBIF records which lacked lat/long data in the first place, which sometimes happens).&lt;br /&gt;
&lt;br /&gt;
GeogUtils also contains open access libraries for processing shapefile/dbf files -- these are standard GIS file formats, and various publicly-accessible shapefiles might serve as sources for polygons.&lt;br /&gt;
&lt;br /&gt;
Warning: the point-in-polygon operation will fail dramatically if your polygon crosses the International Dateline.  The best solution in this case is to split any polygons crossing the dateline into two polygons, one on each side of the line.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===General notes===&lt;br /&gt;
&lt;br /&gt;
GBIF search results often contain non-ASCII characters (e.g. international placenames) and other confusing items, e.g., web links in angle brackets, which can be misinterpreted as unmatched XML tags if a GBIF search result is read to ASCII and then an attempt is made to parse it.&lt;br /&gt;
&lt;br /&gt;
In general, the Geography module will handle things fine if the results are being processed in the background; but to print results to screen, a series of functions from GeneralUtils are used to convert a string to plain ASCII.  This avoids crashes e.g. when printing data to screen.  Therefore,  these printed-to-screen results may slightly alter the content of the original search results.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-08-19T21:50:06Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Background: organization of Bio.Geography */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
All classes and functions have been documented with standard docstrings.  Code is available at the most recent github commit here: http://github.com/nmatzke/biopython/commits/Geography&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
Each class/function has been commented with docstrings.  For latest commit, see: [http://github.com/nmatzke/biopython/commits/Geography http://github.com/nmatzke/biopython/commits/Geography].&lt;br /&gt;
&lt;br /&gt;
==Tutorial==&lt;br /&gt;
&lt;br /&gt;
Bio.Geography is a module for gathering and processing biogeographical data.  The major motivation for the module is to assist analyses of evolutionary biogeography.  A variety of inference algorithms are available for such analyses, such as [http://www.ebc.uu.se/systzoo/research/diva/manual/dmanual.html DIVA] and [http://code.google.com/p/lagrange/ lagrange].  The inputs to such programs are typically (a) a phylogeny and (b) the areas inhabited by the species at the tips of the phylogeny.  A researcher who has gathered data on a particular group will likely have direct access to species location data, but many large-scale analyses may require gathering large amounts of occurrence data.  Automated gathering/processing of occurrence data has a variety of other applications as well, including species mapping, niche modeling, error-checking of museum records, and monitoring range changes.&lt;br /&gt;
&lt;br /&gt;
Occurrence data is derived mainly from museum collections.  The major source of such data is the [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF).  GBIF serves occurrence data recorded by hundreds of museums worldwide.  GBIF occurrence data can be [http://data.gbif.org/occurrences/ searched manually], and results downloaded (see examples on GBIF website) in various formats: spreadsheet, Google Earth KML, or the XML DarwinCore format.  &lt;br /&gt;
&lt;br /&gt;
GBIF can also be accessed via an API. Bio.Geography can process manually downloaded DarwinCore results, or access GBIF directly.&lt;br /&gt;
&lt;br /&gt;
===Background: organization of Bio.Geography===&lt;br /&gt;
&lt;br /&gt;
It is useful to understand the overall organization of classes in Bio.Geography.  There are four classes within the GbifXml module:&lt;br /&gt;
&lt;br /&gt;
*'''GbifSearchResults''' -- Contains the methods for conducting a GBIF search, as well as attributes storing the results in different objects, depending on their stage of processing.  &lt;br /&gt;
**Also contains summary statistics on the search (e.g., number of records found).&lt;br /&gt;
**The three objects which store results in different forms are '''GbifDarwincoreXmlString''', '''GbifXmlTree''', and a list of individual '''GbifObservationRecord''' objects.&lt;br /&gt;
**Method print_records for printing all contained records to screen.&lt;br /&gt;
* '''GbifDarwincoreXmlString''' -- Contains the raw text returned by GBIF.  If output to a file, this would be a standard XML file adhering to the DarwinCore standard.  Inherits from the standard python String class.&lt;br /&gt;
* '''GbifXmlTree''' -- Contains the ElementTree object which results from parsing GBIF's XML results.  Also a number of methods for searching the ElementTree and finding matching elements, finding a certain element when it is contained within a certain larger element, etc.&lt;br /&gt;
* '''GbifObservationRecord''' -- Contains the attributes which may be found within a certain record, e.g. taxon, genus, species, latitude, longitude, etc., as well as functions for classifying a record into a certain geographical area, printing a record to screen, etc.&lt;br /&gt;
&lt;br /&gt;
===Parsing a local (manually downloaded) GBIF DarwinCore XML file===&lt;br /&gt;
&lt;br /&gt;
For one-off uses of GBIF, you may find it easiest to just download occurrence data in spreadsheet format (for analysis) or KML (for mapping).  But for analyses of many groups, or for repeatedly updating an analysis as new data is added to GBIF, automation is desirable.&lt;br /&gt;
&lt;br /&gt;
A manual search conducted on the GBIF website can return results in the form of an XML file adhering to the [http://en.wikipedia.org/wiki/Darwin_Core DarwinCore] data standard.  An example file can be found in biopython's Tests/Geography directory, with the name ''utric_search_v2.xml''.  This file contains over 1000 occurrence records for ''Utricularia'', a genus of carnivorous plant.&lt;br /&gt;
&lt;br /&gt;
Save the utric_search_v2.xml file in your working directory (or download a similar file from GBIF).  Here are suggested steps to parse the file with Bio.Geography's GbifXml module.  First, import the necessary classes and functions, and specify the filename of the input file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Bio.Geography.GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GeneralUtils import fix_ASCII_file&lt;br /&gt;
&lt;br /&gt;
xml_fn = 'utric_search_v2.xml'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Second, in order to display results to screen in python, we need to convert the file to plain ASCII (GBIF results contain all many of unusual characters from different languages, and no standardization of slanted quotes and the like; this can cause crashes when attempting to print to screen in python or ipython).  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
xml_fn_new = fix_ASCII_file(xml_fn)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This creates a new file with the string &amp;quot;_fixed.xml&amp;quot; added to the filename.&lt;br /&gt;
&lt;br /&gt;
Next, we will parse the XML file into an ElementTree (a python object which contains the data from the XML file as a nested series of lists and dictionaries).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from xml.etree import ElementTree as ET&lt;br /&gt;
xmltree = ET.parse(xml_fn_new)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then store the element tree as an object of Class GbifXmlTree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gbif_recs_xmltree = GbifXmlTree(xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, with the xmltree stored, we parse it into individual records (stored in individual objects of class GbifObservationRecord), which are then stored as a group in an object of class GbifSearchResults.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs = GbifSearchResults(gbif_recs_xmltree)&lt;br /&gt;
recs.extract_occurrences_from_gbif_xmltree(recs.gbif_recs_xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The list of individual observation records can be accessed at recs.obs_recs_list.  This will display the references to the first five records:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs.obs_recs_list[0:4]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To get the data for the first individual record:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;rec = recs.obs_recs_list[0]&lt;br /&gt;
&lt;br /&gt;
dir(rec)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
rec.lat will return the latitude, rec.long the longitude, etc.  Certain data attributes are not found in all GBIF records; if they are missing, the field in question will contain &amp;quot;None&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
To print all of the records in a tab-delimited table format:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Checking how many matching records are hosted by GBIF===&lt;br /&gt;
&lt;br /&gt;
Before we go through the trouble of downloading thousands of records, we may wish to know how many there are in GBIF first.  The user must set up a dictionary containing the fields and search terms as keys and items, respectively.  I.e.,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;from GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;quot;'format': 'darwin'&amp;quot; specifies that GBIF should return the results in DarwinCore format.&lt;br /&gt;
&lt;br /&gt;
'scientificname' specifies the genus name to search on.  Adding an '*' after the name will return anything that begins with &amp;quot;Genlisea&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
The full list of search terms can be found on GBIF's [http://data.gbif.org/tutorial/services Occurrence record data service], which is linked from the [http://data.gbif.org/tutorial/services Using data from the GBIF portal].&lt;br /&gt;
&lt;br /&gt;
Once you have specified your search parameters, initiate a new GbifSearchResults object and run get_numhits to get the number of hits:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&lt;br /&gt;
recs = GbifSearchResults()&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As of August 2009, 169 matching records existed in GBIF matching &amp;quot;Genlisea*&amp;quot;&lt;br /&gt;
&lt;br /&gt;
For constrast, run the same search ''without'' the asterisk ('*'):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea'}&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We only get ~10 results -- presumably records of specimens only identified down to genus and no further.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading an individual record===&lt;br /&gt;
&lt;br /&gt;
Individual records can be downloaded by key.  To download an individual record:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
rec = recs.obs_recs_list[0]&lt;br /&gt;
key = rec.gbifkey&lt;br /&gt;
# (or manually)&lt;br /&gt;
# key = 175067484&lt;br /&gt;
xmlrec = recs.get_record(key)&lt;br /&gt;
print xmlrec&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to print the xmlrec ElementTree object, store xmlrec in a GbifXmlTree object and run print_xmltree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;GbifXmlTree(xmlrec).print_xmltree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Summary statistics for phylogenetic trees with TreeSum===&lt;br /&gt;
&lt;br /&gt;
Biogeographical regions are often characterized by alpha and beta-diversity statistics: basically, these are indices of the number of species found within or between regions.  Given a phylogeny for organisms in a region, phylogenetic alpha- and beta-diversity statistics can be calculated.  This has been implemented in a thorough way in the [http://www.phylodiversity.net/phylocom/ phylocom package] by Webb et al., but for some purposes it is useful to calculate the statistics directly in python.&lt;br /&gt;
&lt;br /&gt;
Here, we need to start with a Newick tree string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;trstr2 = &amp;quot;(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);&amp;quot;&lt;br /&gt;
&lt;br /&gt;
to2 = Tree(trstr2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, we create a tree summary object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts = TreeSum(to2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The function test_Tree will run the metrics (MPD = Mean Phylogenetic Distance, NRI = Net Relatedness Index, MNPD = Mean Nearest Neighbor Phylogenetic Distance, NTI = Nearest Taxon Index, PD = total Phylogenetic distance) and output to screen:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts.test_Tree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
By subsetting a tree to taxa only existing within a region, statistics can be calculated by region.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading and processing large numbers of records===&lt;br /&gt;
&lt;br /&gt;
GBIF only allows a maximum of 1000 observation records to be downloaded at a time (10,000 for KML records).  To get more, we need to download and process them in stages.&lt;br /&gt;
&lt;br /&gt;
Again we will set up our parameters dictionary, and also an &amp;quot;inc&amp;quot; variable to specify the number of records to download per server request.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&lt;br /&gt;
inc = 100&lt;br /&gt;
recs3 = GbifSearchResults()&lt;br /&gt;
gbif_xmltree_list = recs3.get_all_records_by_increment(params, inc)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with biopython's interactions with NCBI servers, the GbifSearchResults module keeps track of when the last GBIF request was made, and requires a 3-second wait before a new request.&lt;br /&gt;
&lt;br /&gt;
Each server request returns an XML string; these are parsed into GbifXmlTree objects, and a list of the returned GbifXmlTree objects is returned to gbif_xmltree_list.  The individual records have also been parsed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs3.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Classifying records into geographical regions===&lt;br /&gt;
&lt;br /&gt;
Biogeographical analyses will often require that you determine what area(s) a taxon lives in.  Areas are not always obviously delineated, and analysts may wish to try several different possible sets of areas and see how this influences their analysis.&lt;br /&gt;
&lt;br /&gt;
Below, we set up a polygon containing the latitude/longitude coordinates for the Northern Hemisphere, and then set the &amp;quot;area&amp;quot; attribute for each matching record to &amp;quot;NorthernHemisphere&amp;quot;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ul = (-180, 90)&lt;br /&gt;
ur = (180, 90)&lt;br /&gt;
ll = (-180, 0)&lt;br /&gt;
lr = (180, 0)&lt;br /&gt;
poly = [ul, ur, ll, lr]&lt;br /&gt;
polyname = &amp;quot;NorthernHemisphere&amp;quot;&lt;br /&gt;
&lt;br /&gt;
recs3.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This process can be repeated for all polygons of interest until all GBIF records have been classified (except for GBIF records which lacked lat/long data in the first place, which sometimes happens).&lt;br /&gt;
&lt;br /&gt;
GeogUtils also contains open access libraries for processing shapefile/dbf files -- these are standard GIS file formats, and various publicly-accessible shapefiles might serve as sources for polygons.&lt;br /&gt;
&lt;br /&gt;
Warning: the point-in-polygon operation will fail dramatically if your polygon crosses the International Dateline.  The best solution in this case is to split any polygons crossing the dateline into two polygons, one on each side of the line.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===General notes===&lt;br /&gt;
&lt;br /&gt;
GBIF search results often contain non-ASCII characters (e.g. international placenames) and other confusing items, e.g., web links in angle brackets, which can be misinterpreted as unmatched XML tags if a GBIF search result is read to ASCII and then an attempt is made to parse it.&lt;br /&gt;
&lt;br /&gt;
In general, the Geography module will handle things fine if the results are being processed in the background; but to print results to screen, a series of functions from GeneralUtils are used to convert a string to plain ASCII.  This avoids crashes e.g. when printing data to screen.  Therefore,  these printed-to-screen results may slightly alter the content of the original search results.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-08-19T21:49:13Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Parsing a local (manually downloaded) GBIF DarwinCore XML file */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
All classes and functions have been documented with standard docstrings.  Code is available at the most recent github commit here: http://github.com/nmatzke/biopython/commits/Geography&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
Each class/function has been commented with docstrings.  For latest commit, see: [http://github.com/nmatzke/biopython/commits/Geography http://github.com/nmatzke/biopython/commits/Geography].&lt;br /&gt;
&lt;br /&gt;
==Tutorial==&lt;br /&gt;
&lt;br /&gt;
Bio.Geography is a module for gathering and processing biogeographical data.  The major motivation for the module is to assist analyses of evolutionary biogeography.  A variety of inference algorithms are available for such analyses, such as [http://www.ebc.uu.se/systzoo/research/diva/manual/dmanual.html DIVA] and [http://code.google.com/p/lagrange/ lagrange].  The inputs to such programs are typically (a) a phylogeny and (b) the areas inhabited by the species at the tips of the phylogeny.  A researcher who has gathered data on a particular group will likely have direct access to species location data, but many large-scale analyses may require gathering large amounts of occurrence data.  Automated gathering/processing of occurrence data has a variety of other applications as well, including species mapping, niche modeling, error-checking of museum records, and monitoring range changes.&lt;br /&gt;
&lt;br /&gt;
Occurrence data is derived mainly from museum collections.  The major source of such data is the [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF).  GBIF serves occurrence data recorded by hundreds of museums worldwide.  GBIF occurrence data can be [http://data.gbif.org/occurrences/ searched manually], and results downloaded (see examples on GBIF website) in various formats: spreadsheet, Google Earth KML, or the XML DarwinCore format.  &lt;br /&gt;
&lt;br /&gt;
GBIF can also be accessed via an API. Bio.Geography can process manually downloaded DarwinCore results, or access GBIF directly.&lt;br /&gt;
&lt;br /&gt;
===Background: organization of Bio.Geography===&lt;br /&gt;
&lt;br /&gt;
It is useful to understand the overall organization of classes in Bio.Geography.  There are four classes within the GbifXml module:&lt;br /&gt;
&lt;br /&gt;
*GbifSearchResults -- Contains the methods for conducting a GBIF search, as well as attributes storing the results in different objects, depending on their stage of processing.  &lt;br /&gt;
**Also contains summary statistics on the search (e.g., number of records found).&lt;br /&gt;
**The three objects which store results in different forms are GbifDarwincoreXmlString, GbifXmlTree, and a list of individual GbifObservationRecord objects.&lt;br /&gt;
**Method print_records for printing all contained records to screen.&lt;br /&gt;
* GbifDarwincoreXmlString -- Contains the raw text returned by GBIF.  If output to a file, this would be a standard XML file adhering to the DarwinCore standard.  Inherits from the standard python String class.&lt;br /&gt;
* GbifXmlTree -- Contains the ElementTree object which results from parsing GBIF's XML results.  Also a number of methods for searching the ElementTree and finding matching elements, finding a certain element when it is contained within a certain larger element, etc.&lt;br /&gt;
* GbifObservationRecord -- Contains the attributes which may be found within a certain record, e.g. taxon, genus, species, latitude, longitude, etc., as well as functions for classifying a record into a certain geographical area, printing a record to screen, etc.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Parsing a local (manually downloaded) GBIF DarwinCore XML file===&lt;br /&gt;
&lt;br /&gt;
For one-off uses of GBIF, you may find it easiest to just download occurrence data in spreadsheet format (for analysis) or KML (for mapping).  But for analyses of many groups, or for repeatedly updating an analysis as new data is added to GBIF, automation is desirable.&lt;br /&gt;
&lt;br /&gt;
A manual search conducted on the GBIF website can return results in the form of an XML file adhering to the [http://en.wikipedia.org/wiki/Darwin_Core DarwinCore] data standard.  An example file can be found in biopython's Tests/Geography directory, with the name ''utric_search_v2.xml''.  This file contains over 1000 occurrence records for ''Utricularia'', a genus of carnivorous plant.&lt;br /&gt;
&lt;br /&gt;
Save the utric_search_v2.xml file in your working directory (or download a similar file from GBIF).  Here are suggested steps to parse the file with Bio.Geography's GbifXml module.  First, import the necessary classes and functions, and specify the filename of the input file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Bio.Geography.GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GeneralUtils import fix_ASCII_file&lt;br /&gt;
&lt;br /&gt;
xml_fn = 'utric_search_v2.xml'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Second, in order to display results to screen in python, we need to convert the file to plain ASCII (GBIF results contain all many of unusual characters from different languages, and no standardization of slanted quotes and the like; this can cause crashes when attempting to print to screen in python or ipython).  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
xml_fn_new = fix_ASCII_file(xml_fn)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This creates a new file with the string &amp;quot;_fixed.xml&amp;quot; added to the filename.&lt;br /&gt;
&lt;br /&gt;
Next, we will parse the XML file into an ElementTree (a python object which contains the data from the XML file as a nested series of lists and dictionaries).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from xml.etree import ElementTree as ET&lt;br /&gt;
xmltree = ET.parse(xml_fn_new)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then store the element tree as an object of Class GbifXmlTree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gbif_recs_xmltree = GbifXmlTree(xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, with the xmltree stored, we parse it into individual records (stored in individual objects of class GbifObservationRecord), which are then stored as a group in an object of class GbifSearchResults.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs = GbifSearchResults(gbif_recs_xmltree)&lt;br /&gt;
recs.extract_occurrences_from_gbif_xmltree(recs.gbif_recs_xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The list of individual observation records can be accessed at recs.obs_recs_list.  This will display the references to the first five records:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs.obs_recs_list[0:4]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To get the data for the first individual record:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;rec = recs.obs_recs_list[0]&lt;br /&gt;
&lt;br /&gt;
dir(rec)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
rec.lat will return the latitude, rec.long the longitude, etc.  Certain data attributes are not found in all GBIF records; if they are missing, the field in question will contain &amp;quot;None&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
To print all of the records in a tab-delimited table format:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Checking how many matching records are hosted by GBIF===&lt;br /&gt;
&lt;br /&gt;
Before we go through the trouble of downloading thousands of records, we may wish to know how many there are in GBIF first.  The user must set up a dictionary containing the fields and search terms as keys and items, respectively.  I.e.,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;from GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;quot;'format': 'darwin'&amp;quot; specifies that GBIF should return the results in DarwinCore format.&lt;br /&gt;
&lt;br /&gt;
'scientificname' specifies the genus name to search on.  Adding an '*' after the name will return anything that begins with &amp;quot;Genlisea&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
The full list of search terms can be found on GBIF's [http://data.gbif.org/tutorial/services Occurrence record data service], which is linked from the [http://data.gbif.org/tutorial/services Using data from the GBIF portal].&lt;br /&gt;
&lt;br /&gt;
Once you have specified your search parameters, initiate a new GbifSearchResults object and run get_numhits to get the number of hits:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&lt;br /&gt;
recs = GbifSearchResults()&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As of August 2009, 169 matching records existed in GBIF matching &amp;quot;Genlisea*&amp;quot;&lt;br /&gt;
&lt;br /&gt;
For constrast, run the same search ''without'' the asterisk ('*'):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea'}&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We only get ~10 results -- presumably records of specimens only identified down to genus and no further.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading an individual record===&lt;br /&gt;
&lt;br /&gt;
Individual records can be downloaded by key.  To download an individual record:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
rec = recs.obs_recs_list[0]&lt;br /&gt;
key = rec.gbifkey&lt;br /&gt;
# (or manually)&lt;br /&gt;
# key = 175067484&lt;br /&gt;
xmlrec = recs.get_record(key)&lt;br /&gt;
print xmlrec&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to print the xmlrec ElementTree object, store xmlrec in a GbifXmlTree object and run print_xmltree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;GbifXmlTree(xmlrec).print_xmltree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Summary statistics for phylogenetic trees with TreeSum===&lt;br /&gt;
&lt;br /&gt;
Biogeographical regions are often characterized by alpha and beta-diversity statistics: basically, these are indices of the number of species found within or between regions.  Given a phylogeny for organisms in a region, phylogenetic alpha- and beta-diversity statistics can be calculated.  This has been implemented in a thorough way in the [http://www.phylodiversity.net/phylocom/ phylocom package] by Webb et al., but for some purposes it is useful to calculate the statistics directly in python.&lt;br /&gt;
&lt;br /&gt;
Here, we need to start with a Newick tree string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;trstr2 = &amp;quot;(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);&amp;quot;&lt;br /&gt;
&lt;br /&gt;
to2 = Tree(trstr2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, we create a tree summary object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts = TreeSum(to2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The function test_Tree will run the metrics (MPD = Mean Phylogenetic Distance, NRI = Net Relatedness Index, MNPD = Mean Nearest Neighbor Phylogenetic Distance, NTI = Nearest Taxon Index, PD = total Phylogenetic distance) and output to screen:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts.test_Tree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
By subsetting a tree to taxa only existing within a region, statistics can be calculated by region.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading and processing large numbers of records===&lt;br /&gt;
&lt;br /&gt;
GBIF only allows a maximum of 1000 observation records to be downloaded at a time (10,000 for KML records).  To get more, we need to download and process them in stages.&lt;br /&gt;
&lt;br /&gt;
Again we will set up our parameters dictionary, and also an &amp;quot;inc&amp;quot; variable to specify the number of records to download per server request.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&lt;br /&gt;
inc = 100&lt;br /&gt;
recs3 = GbifSearchResults()&lt;br /&gt;
gbif_xmltree_list = recs3.get_all_records_by_increment(params, inc)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with biopython's interactions with NCBI servers, the GbifSearchResults module keeps track of when the last GBIF request was made, and requires a 3-second wait before a new request.&lt;br /&gt;
&lt;br /&gt;
Each server request returns an XML string; these are parsed into GbifXmlTree objects, and a list of the returned GbifXmlTree objects is returned to gbif_xmltree_list.  The individual records have also been parsed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs3.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Classifying records into geographical regions===&lt;br /&gt;
&lt;br /&gt;
Biogeographical analyses will often require that you determine what area(s) a taxon lives in.  Areas are not always obviously delineated, and analysts may wish to try several different possible sets of areas and see how this influences their analysis.&lt;br /&gt;
&lt;br /&gt;
Below, we set up a polygon containing the latitude/longitude coordinates for the Northern Hemisphere, and then set the &amp;quot;area&amp;quot; attribute for each matching record to &amp;quot;NorthernHemisphere&amp;quot;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ul = (-180, 90)&lt;br /&gt;
ur = (180, 90)&lt;br /&gt;
ll = (-180, 0)&lt;br /&gt;
lr = (180, 0)&lt;br /&gt;
poly = [ul, ur, ll, lr]&lt;br /&gt;
polyname = &amp;quot;NorthernHemisphere&amp;quot;&lt;br /&gt;
&lt;br /&gt;
recs3.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This process can be repeated for all polygons of interest until all GBIF records have been classified (except for GBIF records which lacked lat/long data in the first place, which sometimes happens).&lt;br /&gt;
&lt;br /&gt;
GeogUtils also contains open access libraries for processing shapefile/dbf files -- these are standard GIS file formats, and various publicly-accessible shapefiles might serve as sources for polygons.&lt;br /&gt;
&lt;br /&gt;
Warning: the point-in-polygon operation will fail dramatically if your polygon crosses the International Dateline.  The best solution in this case is to split any polygons crossing the dateline into two polygons, one on each side of the line.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===General notes===&lt;br /&gt;
&lt;br /&gt;
GBIF search results often contain non-ASCII characters (e.g. international placenames) and other confusing items, e.g., web links in angle brackets, which can be misinterpreted as unmatched XML tags if a GBIF search result is read to ASCII and then an attempt is made to parse it.&lt;br /&gt;
&lt;br /&gt;
In general, the Geography module will handle things fine if the results are being processed in the background; but to print results to screen, a series of functions from GeneralUtils are used to convert a string to plain ASCII.  This avoids crashes e.g. when printing data to screen.  Therefore,  these printed-to-screen results may slightly alter the content of the original search results.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-08-19T08:57:31Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Summary of functions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
All classes and functions have been documented with standard docstrings.  Code is available at the most recent github commit here: http://github.com/nmatzke/biopython/commits/Geography&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
Each class/function has been commented with docstrings.  For latest commit, see: [http://github.com/nmatzke/biopython/commits/Geography http://github.com/nmatzke/biopython/commits/Geography].&lt;br /&gt;
&lt;br /&gt;
==Tutorial==&lt;br /&gt;
&lt;br /&gt;
Bio.Geography is a module for gathering and processing biogeographical data.  The major motivation for the module is to assist analyses of evolutionary biogeography.  A variety of inference algorithms are available for such analyses, such as [http://www.ebc.uu.se/systzoo/research/diva/manual/dmanual.html DIVA] and [http://code.google.com/p/lagrange/ lagrange].  The inputs to such programs are typically (a) a phylogeny and (b) the areas inhabited by the species at the tips of the phylogeny.  A researcher who has gathered data on a particular group will likely have direct access to species location data, but many large-scale analyses may require gathering large amounts of occurrence data.  Automated gathering/processing of occurrence data has a variety of other applications as well, including species mapping, niche modeling, error-checking of museum records, and monitoring range changes.&lt;br /&gt;
&lt;br /&gt;
Occurrence data is derived mainly from museum collections.  The major source of such data is the [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF).  GBIF serves occurrence data recorded by hundreds of museums worldwide.  GBIF occurrence data can be [http://data.gbif.org/occurrences/ searched manually], and results downloaded (see examples on GBIF website) in various formats: spreadsheet, Google Earth KML, or the XML DarwinCore format.  &lt;br /&gt;
&lt;br /&gt;
GBIF can also be accessed via an API. Bio.Geography can process manually downloaded DarwinCore results, or access GBIF directly.&lt;br /&gt;
&lt;br /&gt;
===Parsing a local (manually downloaded) GBIF DarwinCore XML file===&lt;br /&gt;
&lt;br /&gt;
For one-off uses of GBIF, you may find it easiest to just download occurrence data in spreadsheet format (for analysis) or KML (for mapping).  But for analyses of many groups, or for repeatedly updating an analysis as new data is added to GBIF, automation is desirable.&lt;br /&gt;
&lt;br /&gt;
A manual search conducted on the GBIF website can return results in the form of an XML file adhering to the [http://en.wikipedia.org/wiki/Darwin_Core DarwinCore] data standard.  An example file can be found in biopython's Tests/Geography directory, with the name ''utric_search_v2.xml''.  This file contains over 1000 occurrence records for ''Utricularia'', a genus of carnivorous plant.&lt;br /&gt;
&lt;br /&gt;
Save the utric_search_v2.xml file in your working directory (or download a similar file from GBIF).  Here are suggested steps to parse the file with Bio.Geography's GbifXml module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Bio.Geography.GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GeneralUtils import fix_ASCII_file&lt;br /&gt;
&lt;br /&gt;
xml_fn = 'utric_search_v2.xml'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
First, in order to display results to screen in python, we need to convert the file to plain ASCII (GBIF results contain all many of unusual characters from different languages, and no standardization of slanted quotes and the like; this can cause crashes when attempting to print to screen in python).  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
xml_fn_new = fix_ASCII_file(xml_fn)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This creates a new file with the string &amp;quot;_fixed.xml&amp;quot; added to the filename.&lt;br /&gt;
&lt;br /&gt;
Next, we will parse the XML file into an ElementTree (a python object which contains the data from the XML file as a nested series of lists and dictionaries).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from xml.etree import ElementTree as ET&lt;br /&gt;
xmltree = ET.parse(xml_fn_new)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then store the element tree as nn object of Class GbifXmlTree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gbif_recs_xmltree = GbifXmlTree(xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, with the xmltree stored, we parse it into individual records (stored in individual objects of class GbifObservationRecord), which are then stored as a group in an object of class GbifSearchResults.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs = GbifSearchResults(gbif_recs_xmltree)&lt;br /&gt;
recs.extract_occurrences_from_gbif_xmltree(recs.gbif_recs_xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The list of individual observation records can be accessed at recs.obs_recs_list:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs.obs_recs_list[0:4]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To get the data for the first individual record:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;rec = recs.obs_recs_list[0]&lt;br /&gt;
&lt;br /&gt;
dir(rec)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
rec.lat will return the latitude, rec.long the longitude, etc.  Certain data attributes are not found in all GBIF records; if they are missing, the field in question will contain &amp;quot;None&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
To print all of the records in a tab-delimited table format:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Checking how many matching records are hosted by GBIF===&lt;br /&gt;
&lt;br /&gt;
Before we go through the trouble of downloading thousands of records, we may wish to know how many there are in GBIF first.  The user must set up a dictionary containing the fields and search terms as keys and items, respectively.  I.e.,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;from GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;quot;'format': 'darwin'&amp;quot; specifies that GBIF should return the results in DarwinCore format.&lt;br /&gt;
&lt;br /&gt;
'scientificname' specifies the genus name to search on.  Adding an '*' after the name will return anything that begins with &amp;quot;Genlisea&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
The full list of search terms can be found on GBIF's [http://data.gbif.org/tutorial/services Occurrence record data service], which is linked from the [http://data.gbif.org/tutorial/services Using data from the GBIF portal].&lt;br /&gt;
&lt;br /&gt;
Once you have specified your search parameters, initiate a new GbifSearchResults object and run get_numhits to get the number of hits:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&lt;br /&gt;
recs = GbifSearchResults()&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As of August 2009, 169 matching records existed in GBIF matching &amp;quot;Genlisea*&amp;quot;&lt;br /&gt;
&lt;br /&gt;
For constrast, run the same search ''without'' the asterisk ('*'):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea'}&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We only get ~10 results -- presumably records of specimens only identified down to genus and no further.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading an individual record===&lt;br /&gt;
&lt;br /&gt;
Individual records can be downloaded by key.  To download an individual record:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
rec = recs.obs_recs_list[0]&lt;br /&gt;
key = rec.gbifkey&lt;br /&gt;
# (or manually)&lt;br /&gt;
# key = 175067484&lt;br /&gt;
xmlrec = recs.get_record(key)&lt;br /&gt;
print xmlrec&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to print the xmlrec ElementTree object, store xmlrec in a GbifXmlTree object and run print_xmltree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;GbifXmlTree(xmlrec).print_xmltree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Summary statistics for phylogenetic trees with TreeSum===&lt;br /&gt;
&lt;br /&gt;
Biogeographical regions are often characterized by alpha and beta-diversity statistics: basically, these are indices of the number of species found within or between regions.  Given a phylogeny for organisms in a region, phylogenetic alpha- and beta-diversity statistics can be calculated.  This has been implemented in a thorough way in the [http://www.phylodiversity.net/phylocom/ phylocom package] by Webb et al., but for some purposes it is useful to calculate the statistics directly in python.&lt;br /&gt;
&lt;br /&gt;
Here, we need to start with a Newick tree string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;trstr2 = &amp;quot;(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);&amp;quot;&lt;br /&gt;
&lt;br /&gt;
to2 = Tree(trstr2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, we create a tree summary object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts = TreeSum(to2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The function test_Tree will run the metrics (MPD = Mean Phylogenetic Distance, NRI = Net Relatedness Index, MNPD = Mean Nearest Neighbor Phylogenetic Distance, NTI = Nearest Taxon Index, PD = total Phylogenetic distance) and output to screen:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts.test_Tree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
By subsetting a tree to taxa only existing within a region, statistics can be calculated by region.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading and processing large numbers of records===&lt;br /&gt;
&lt;br /&gt;
GBIF only allows a maximum of 1000 observation records to be downloaded at a time (10,000 for KML records).  To get more, we need to download and process them in stages.&lt;br /&gt;
&lt;br /&gt;
Again we will set up our parameters dictionary, and also an &amp;quot;inc&amp;quot; variable to specify the number of records to download per server request.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&lt;br /&gt;
inc = 100&lt;br /&gt;
recs3 = GbifSearchResults()&lt;br /&gt;
gbif_xmltree_list = recs3.get_all_records_by_increment(params, inc)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with biopython's interactions with NCBI servers, the GbifSearchResults module keeps track of when the last GBIF request was made, and requires a 3-second wait before a new request.&lt;br /&gt;
&lt;br /&gt;
Each server request returns an XML string; these are parsed into GbifXmlTree objects, and a list of the returned GbifXmlTree objects is returned to gbif_xmltree_list.  The individual records have also been parsed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs3.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Classifying records into geographical regions===&lt;br /&gt;
&lt;br /&gt;
Biogeographical analyses will often require that you determine what area(s) a taxon lives in.  Areas are not always obviously delineated, and analysts may wish to try several different possible sets of areas and see how this influences their analysis.&lt;br /&gt;
&lt;br /&gt;
Below, we set up a polygon containing the latitude/longitude coordinates for the Northern Hemisphere, and then set the &amp;quot;area&amp;quot; attribute for each matching record to &amp;quot;NorthernHemisphere&amp;quot;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ul = (-180, 90)&lt;br /&gt;
ur = (180, 90)&lt;br /&gt;
ll = (-180, 0)&lt;br /&gt;
lr = (180, 0)&lt;br /&gt;
poly = [ul, ur, ll, lr]&lt;br /&gt;
polyname = &amp;quot;NorthernHemisphere&amp;quot;&lt;br /&gt;
&lt;br /&gt;
recs3.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This process can be repeated for all polygons of interest until all GBIF records have been classified (except for GBIF records which lacked lat/long data in the first place, which sometimes happens).&lt;br /&gt;
&lt;br /&gt;
GeogUtils also contains open access libraries for processing shapefile/dbf files -- these are standard GIS file formats, and various publicly-accessible shapefiles might serve as sources for polygons.&lt;br /&gt;
&lt;br /&gt;
Warning: the point-in-polygon operation will fail dramatically if your polygon crosses the International Dateline.  The best solution in this case is to split any polygons crossing the dateline into two polygons, one on each side of the line.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===General notes===&lt;br /&gt;
&lt;br /&gt;
GBIF search results often contain non-ASCII characters (e.g. international placenames) and other confusing items, e.g., web links in angle brackets, which can be misinterpreted as unmatched XML tags if a GBIF search result is read to ASCII and then an attempt is made to parse it.&lt;br /&gt;
&lt;br /&gt;
In general, the Geography module will handle things fine if the results are being processed in the background; but to print results to screen, a series of functions from GeneralUtils are used to convert a string to plain ASCII.  This avoids crashes e.g. when printing data to screen.  Therefore,  these printed-to-screen results may slightly alter the content of the original search results.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-08-19T08:47:23Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: major update&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
Each class/function has been commented with docstrings.  For latest commit, see: [http://github.com/nmatzke/biopython/commits/Geography http://github.com/nmatzke/biopython/commits/Geography].&lt;br /&gt;
&lt;br /&gt;
==Tutorial==&lt;br /&gt;
&lt;br /&gt;
Bio.Geography is a module for gathering and processing biogeographical data.  The major motivation for the module is to assist analyses of evolutionary biogeography.  A variety of inference algorithms are available for such analyses, such as [http://www.ebc.uu.se/systzoo/research/diva/manual/dmanual.html DIVA] and [http://code.google.com/p/lagrange/ lagrange].  The inputs to such programs are typically (a) a phylogeny and (b) the areas inhabited by the species at the tips of the phylogeny.  A researcher who has gathered data on a particular group will likely have direct access to species location data, but many large-scale analyses may require gathering large amounts of occurrence data.  Automated gathering/processing of occurrence data has a variety of other applications as well, including species mapping, niche modeling, error-checking of museum records, and monitoring range changes.&lt;br /&gt;
&lt;br /&gt;
Occurrence data is derived mainly from museum collections.  The major source of such data is the [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF).  GBIF serves occurrence data recorded by hundreds of museums worldwide.  GBIF occurrence data can be [http://data.gbif.org/occurrences/ searched manually], and results downloaded (see examples on GBIF website) in various formats: spreadsheet, Google Earth KML, or the XML DarwinCore format.  &lt;br /&gt;
&lt;br /&gt;
GBIF can also be accessed via an API. Bio.Geography can process manually downloaded DarwinCore results, or access GBIF directly.&lt;br /&gt;
&lt;br /&gt;
===Parsing a local (manually downloaded) GBIF DarwinCore XML file===&lt;br /&gt;
&lt;br /&gt;
For one-off uses of GBIF, you may find it easiest to just download occurrence data in spreadsheet format (for analysis) or KML (for mapping).  But for analyses of many groups, or for repeatedly updating an analysis as new data is added to GBIF, automation is desirable.&lt;br /&gt;
&lt;br /&gt;
A manual search conducted on the GBIF website can return results in the form of an XML file adhering to the [http://en.wikipedia.org/wiki/Darwin_Core DarwinCore] data standard.  An example file can be found in biopython's Tests/Geography directory, with the name ''utric_search_v2.xml''.  This file contains over 1000 occurrence records for ''Utricularia'', a genus of carnivorous plant.&lt;br /&gt;
&lt;br /&gt;
Save the utric_search_v2.xml file in your working directory (or download a similar file from GBIF).  Here are suggested steps to parse the file with Bio.Geography's GbifXml module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Bio.Geography.GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GeneralUtils import fix_ASCII_file&lt;br /&gt;
&lt;br /&gt;
xml_fn = 'utric_search_v2.xml'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
First, in order to display results to screen in python, we need to convert the file to plain ASCII (GBIF results contain all many of unusual characters from different languages, and no standardization of slanted quotes and the like; this can cause crashes when attempting to print to screen in python).  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
xml_fn_new = fix_ASCII_file(xml_fn)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This creates a new file with the string &amp;quot;_fixed.xml&amp;quot; added to the filename.&lt;br /&gt;
&lt;br /&gt;
Next, we will parse the XML file into an ElementTree (a python object which contains the data from the XML file as a nested series of lists and dictionaries).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from xml.etree import ElementTree as ET&lt;br /&gt;
xmltree = ET.parse(xml_fn_new)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then store the element tree as nn object of Class GbifXmlTree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gbif_recs_xmltree = GbifXmlTree(xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, with the xmltree stored, we parse it into individual records (stored in individual objects of class GbifObservationRecord), which are then stored as a group in an object of class GbifSearchResults.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs = GbifSearchResults(gbif_recs_xmltree)&lt;br /&gt;
recs.extract_occurrences_from_gbif_xmltree(recs.gbif_recs_xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The list of individual observation records can be accessed at recs.obs_recs_list:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs.obs_recs_list[0:4]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To get the data for the first individual record:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;rec = recs.obs_recs_list[0]&lt;br /&gt;
&lt;br /&gt;
dir(rec)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
rec.lat will return the latitude, rec.long the longitude, etc.  Certain data attributes are not found in all GBIF records; if they are missing, the field in question will contain &amp;quot;None&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
To print all of the records in a tab-delimited table format:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Checking how many matching records are hosted by GBIF===&lt;br /&gt;
&lt;br /&gt;
Before we go through the trouble of downloading thousands of records, we may wish to know how many there are in GBIF first.  The user must set up a dictionary containing the fields and search terms as keys and items, respectively.  I.e.,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;from GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;quot;'format': 'darwin'&amp;quot; specifies that GBIF should return the results in DarwinCore format.&lt;br /&gt;
&lt;br /&gt;
'scientificname' specifies the genus name to search on.  Adding an '*' after the name will return anything that begins with &amp;quot;Genlisea&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
The full list of search terms can be found on GBIF's [http://data.gbif.org/tutorial/services Occurrence record data service], which is linked from the [http://data.gbif.org/tutorial/services Using data from the GBIF portal].&lt;br /&gt;
&lt;br /&gt;
Once you have specified your search parameters, initiate a new GbifSearchResults object and run get_numhits to get the number of hits:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&lt;br /&gt;
recs = GbifSearchResults()&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As of August 2009, 169 matching records existed in GBIF matching &amp;quot;Genlisea*&amp;quot;&lt;br /&gt;
&lt;br /&gt;
For constrast, run the same search ''without'' the asterisk ('*'):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea'}&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We only get ~10 results -- presumably records of specimens only identified down to genus and no further.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading an individual record===&lt;br /&gt;
&lt;br /&gt;
Individual records can be downloaded by key.  To download an individual record:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
rec = recs.obs_recs_list[0]&lt;br /&gt;
key = rec.gbifkey&lt;br /&gt;
# (or manually)&lt;br /&gt;
# key = 175067484&lt;br /&gt;
xmlrec = recs.get_record(key)&lt;br /&gt;
print xmlrec&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you want to print the xmlrec ElementTree object, store xmlrec in a GbifXmlTree object and run print_xmltree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;GbifXmlTree(xmlrec).print_xmltree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Summary statistics for phylogenetic trees with TreeSum===&lt;br /&gt;
&lt;br /&gt;
Biogeographical regions are often characterized by alpha and beta-diversity statistics: basically, these are indices of the number of species found within or between regions.  Given a phylogeny for organisms in a region, phylogenetic alpha- and beta-diversity statistics can be calculated.  This has been implemented in a thorough way in the [http://www.phylodiversity.net/phylocom/ phylocom package] by Webb et al., but for some purposes it is useful to calculate the statistics directly in python.&lt;br /&gt;
&lt;br /&gt;
Here, we need to start with a Newick tree string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;trstr2 = &amp;quot;(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);&amp;quot;&lt;br /&gt;
&lt;br /&gt;
to2 = Tree(trstr2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, we create a tree summary object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts = TreeSum(to2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The function test_Tree will run the metrics (MPD = Mean Phylogenetic Distance, NRI = Net Relatedness Index, MNPD = Mean Nearest Neighbor Phylogenetic Distance, NTI = Nearest Taxon Index, PD = total Phylogenetic distance) and output to screen:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts.test_Tree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
By subsetting a tree to taxa only existing within a region, statistics can be calculated by region.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading and processing large numbers of records===&lt;br /&gt;
&lt;br /&gt;
GBIF only allows a maximum of 1000 observation records to be downloaded at a time (10,000 for KML records).  To get more, we need to download and process them in stages.&lt;br /&gt;
&lt;br /&gt;
Again we will set up our parameters dictionary, and also an &amp;quot;inc&amp;quot; variable to specify the number of records to download per server request.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;params = {'format': 'darwin', 'scientificname': 'Genlisea*'}&lt;br /&gt;
inc = 100&lt;br /&gt;
recs3 = GbifSearchResults()&lt;br /&gt;
gbif_xmltree_list = recs3.get_all_records_by_increment(params, inc)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As with biopython's interactions with NCBI servers, the GbifSearchResults module keeps track of when the last GBIF request was made, and requires a 3-second wait before a new request.&lt;br /&gt;
&lt;br /&gt;
Each server request returns an XML string; these are parsed into GbifXmlTree objects, and a list of the returned GbifXmlTree objects is returned to gbif_xmltree_list.  The individual records have also been parsed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs3.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Classifying records into geographical regions===&lt;br /&gt;
&lt;br /&gt;
Biogeographical analyses will often require that you determine what area(s) a taxon lives in.  Areas are not always obviously delineated, and analysts may wish to try several different possible sets of areas and see how this influences their analysis.&lt;br /&gt;
&lt;br /&gt;
Below, we set up a polygon containing the latitude/longitude coordinates for the Northern Hemisphere, and then set the &amp;quot;area&amp;quot; attribute for each matching record to &amp;quot;NorthernHemisphere&amp;quot;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ul = (-180, 90)&lt;br /&gt;
ur = (180, 90)&lt;br /&gt;
ll = (-180, 0)&lt;br /&gt;
lr = (180, 0)&lt;br /&gt;
poly = [ul, ur, ll, lr]&lt;br /&gt;
polyname = &amp;quot;NorthernHemisphere&amp;quot;&lt;br /&gt;
&lt;br /&gt;
recs3.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This process can be repeated for all polygons of interest until all GBIF records have been classified (except for GBIF records which lacked lat/long data in the first place, which sometimes happens).&lt;br /&gt;
&lt;br /&gt;
GeogUtils also contains open access libraries for processing shapefile/dbf files -- these are standard GIS file formats, and various publicly-accessible shapefiles might serve as sources for polygons.&lt;br /&gt;
&lt;br /&gt;
Warning: the point-in-polygon operation will fail dramatically if your polygon crosses the International Dateline.  The best solution in this case is to split any polygons crossing the dateline into two polygons, one on each side of the line.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===General notes===&lt;br /&gt;
&lt;br /&gt;
GBIF search results often contain non-ASCII characters (e.g. international placenames) and other confusing items, e.g., web links in angle brackets, which can be misinterpreted as unmatched XML tags if a GBIF search result is read to ASCII and then an attempt is made to parse it.&lt;br /&gt;
&lt;br /&gt;
In general, the Geography module will handle things fine if the results are being processed in the background; but to print results to screen, a series of functions from GeneralUtils are used to convert a string to plain ASCII.  This avoids crashes e.g. when printing data to screen.  Therefore,  these printed-to-screen results may slightly alter the content of the original search results.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-08-19T08:06:44Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: Added link to github commits.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
Each class/function has been commented with docstrings.  For latest commit, see: [http://github.com/nmatzke/biopython/commits/Geography http://github.com/nmatzke/biopython/commits/Geography].&lt;br /&gt;
&lt;br /&gt;
==Tutorial==&lt;br /&gt;
&lt;br /&gt;
Bio.Geography is a module for gathering and processing biogeographical data.  The major motivation for the module is to assist analyses of evolutionary biogeography.  A variety of inference algorithms are available for such analyses, such as [http://www.ebc.uu.se/systzoo/research/diva/manual/dmanual.html DIVA] and [http://code.google.com/p/lagrange/ lagrange].  The inputs to such programs are typically (a) a phylogeny and (b) the areas inhabited by the species at the tips of the phylogeny.  A researcher who has gathered data on a particular group will likely have direct access to species location data, but many large-scale analyses may require gathering large amounts of occurrence data.  Automated gathering/processing of occurrence data has a variety of other applications as well, including species mapping, niche modeling, error-checking of museum records, and monitoring range changes.&lt;br /&gt;
&lt;br /&gt;
Occurrence data is derived mainly from museum collections.  The major source of such data is the [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF).  GBIF serves occurrence data recorded by hundreds of museums worldwide.  GBIF occurrence data can be [http://data.gbif.org/occurrences/ searched manually], and results downloaded (see examples on GBIF website) in various formats: spreadsheet, Google Earth KML, or the XML DarwinCore format.  &lt;br /&gt;
&lt;br /&gt;
GBIF can also be accessed via an API. Bio.Geography can process manually downloaded DarwinCore results, or access GBIF directly.&lt;br /&gt;
&lt;br /&gt;
===Parsing a local (manually downloaded) GBIF DarwinCore XML file===&lt;br /&gt;
&lt;br /&gt;
For one-off uses of GBIF, you may find it easiest to just download occurrence data in spreadsheet format (for analysis) or KML (for mapping).  But for analyses of many groups, or for repeatedly updating an analysis as new data is added to GBIF, automation is desirable.&lt;br /&gt;
&lt;br /&gt;
A manual search conducted on the GBIF website can return results in the form of an XML file adhering to the [http://en.wikipedia.org/wiki/Darwin_Core DarwinCore] data standard.  An example file can be found in biopython's Tests/Geography directory, with the name ''utric_search_v2.xml''.  This file contains over 1000 occurrence records for ''Utricularia'', a genus of carnivorous plant.&lt;br /&gt;
&lt;br /&gt;
Save the utric_search_v2.xml file in your working directory (or download a similar file from GBIF).  Here are suggested steps to parse the file with Bio.Geography's GbifXml module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Bio.Geography.GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GenUtils import fix_ASCII_file&lt;br /&gt;
&lt;br /&gt;
xml_fn = 'utric_search_v2.xml'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
First, in order to display results to screen in python, we need to convert the file to plain ASCII (GBIF results contain all many of unusual characters from different languages, and no standardization of slanted quotes and the like; this can cause crashes when attempting to print to screen in python).  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
xml_fn_new = fix_ASCII_file(xml_fn)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This creates a new file with the string &amp;quot;_fixed.xml&amp;quot; added to the filename.&lt;br /&gt;
&lt;br /&gt;
Next, we will parse the XML file into an ElementTree (a python object which contains the data from the XML file as a nested series of lists and dictionaries).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from xml.etree import ElementTree as ET&lt;br /&gt;
xmltree = ET.parse(xml_fn_new)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then store the element tree as nn object of Class GbifXmlTree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gbif_recs_xmltree = GbifXmlTree(xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, with the xmltree stored, we parse it into individual records (stored in individual objects of class GbifObservationRecord), which are then stored as a group in an object of class GbifSearchResults.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs = GbifSearchResults(gbif_recs_xmltree)&lt;br /&gt;
recs.latlongs_to_obj()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The list of individual observation records can be accessed at recs.obs_recs_list:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;print recs.obs_recs_list[0:4], '...'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To get the data for the first individual record:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;rec = recs.obs_recs_list[0]&lt;br /&gt;
&lt;br /&gt;
dir(rec)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
rec.lat will return the latitude, rec.long the longitude, etc.  Certain data attributes are not found in all GBIF records; if they are missing, the field in question will contain &amp;quot;None&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
To print all of the records in a tab-delimited table format:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Checking how many matching records are hosted by GBIF===&lt;br /&gt;
&lt;br /&gt;
Before we go through the trouble of downloading thousands of records, we may wish to know how many there are in GBIF first.  The user must set up a dictionary containing the fields and search terms as keys and items, respectively.  I.e.,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;from GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
params = {'format': 'darwin', 'scientificname': 'Utricularia'}&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;quot;'format': 'darwin'&amp;quot; specifies that GBIF should return the results in DarwinCore format. 'scientificname' specifies the genus name to search on.  The full list of search terms can be found on GBIF's [http://data.gbif.org/tutorial/services Occurrence record data service], which is linked from the [http://data.gbif.org/tutorial/services Using data from the GBIF portal].&lt;br /&gt;
&lt;br /&gt;
Once you have specified your search parameters, initiate a new GbifSearchResults object and run get_numhits to get the number of hits:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs = GbifSearchResults()&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As of August 2009, 1141 matching records existed in GBIF matching &amp;quot;Utricularia.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading an indvidual record===&lt;br /&gt;
&lt;br /&gt;
Individual records can be downloaded by key.  To download an individual record:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
key = 175067484&lt;br /&gt;
recs3 = GbifSearchResults()&lt;br /&gt;
xmlrec = recs3.get_record(key)&lt;br /&gt;
print xmlrec&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Summary statistics for phylogenetic trees with TreeSum===&lt;br /&gt;
&lt;br /&gt;
Biogeographical regions are often characterized by alpha and beta-diversity statistics: basically, these are indices of the number of species found within or between regions.  Given a phylogeny for organisms in a region, phylogenetic alpha- and beta-diversity statistics can be calculated.  This has been implemented in a thorough way in the phylocom package by Webb et al., but for some purposes it is useful to calculate the statistics directly in python.&lt;br /&gt;
&lt;br /&gt;
Here, we need to start with a Newick tree string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;trstr2 = &amp;quot;(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);&amp;quot;&lt;br /&gt;
&lt;br /&gt;
to2 = Tree(trstr2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, we create a tree summary object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts = TreeSum(to2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The function test_Tree will run the metrics (MPD = Mean Phylogenetic Distance, NRI = Net Relatedness Index, MNPD = Mean Nearest Neighbor Phylogenetic Distance, NTI = Nearest Taxon Index, PD = total Phylogenetic distance) and output to screen:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts.test_Tree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
By subsetting a tree to taxa only existing within a region, statistics can be calculated by region.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-08-17T19:50:46Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Tutorial */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Tutorial==&lt;br /&gt;
&lt;br /&gt;
Bio.Geography is a module for gathering and processing biogeographical data.  The major motivation for the module is to assist analyses of evolutionary biogeography.  A variety of inference algorithms are available for such analyses, such as [http://www.ebc.uu.se/systzoo/research/diva/manual/dmanual.html DIVA] and [http://code.google.com/p/lagrange/ lagrange].  The inputs to such programs are typically (a) a phylogeny and (b) the areas inhabited by the species at the tips of the phylogeny.  A researcher who has gathered data on a particular group will likely have direct access to species location data, but many large-scale analyses may require gathering large amounts of occurrence data.  Automated gathering/processing of occurrence data has a variety of other applications as well, including species mapping, niche modeling, error-checking of museum records, and monitoring range changes.&lt;br /&gt;
&lt;br /&gt;
Occurrence data is derived mainly from museum collections.  The major source of such data is the [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF).  GBIF serves occurrence data recorded by hundreds of museums worldwide.  GBIF occurrence data can be [http://data.gbif.org/occurrences/ searched manually], and results downloaded (see examples on GBIF website) in various formats: spreadsheet, Google Earth KML, or the XML DarwinCore format.  &lt;br /&gt;
&lt;br /&gt;
GBIF can also be accessed via an API. Bio.Geography can process manually downloaded DarwinCore results, or access GBIF directly.&lt;br /&gt;
&lt;br /&gt;
===Parsing a local (manually downloaded) GBIF DarwinCore XML file===&lt;br /&gt;
&lt;br /&gt;
For one-off uses of GBIF, you may find it easiest to just download occurrence data in spreadsheet format (for analysis) or KML (for mapping).  But for analyses of many groups, or for repeatedly updating an analysis as new data is added to GBIF, automation is desirable.&lt;br /&gt;
&lt;br /&gt;
A manual search conducted on the GBIF website can return results in the form of an XML file adhering to the [http://en.wikipedia.org/wiki/Darwin_Core DarwinCore] data standard.  An example file can be found in biopython's Tests/Geography directory, with the name ''utric_search_v2.xml''.  This file contains over 1000 occurrence records for ''Utricularia'', a genus of carnivorous plant.&lt;br /&gt;
&lt;br /&gt;
Save the utric_search_v2.xml file in your working directory (or download a similar file from GBIF).  Here are suggested steps to parse the file with Bio.Geography's GbifXml module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Bio.Geography.GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GenUtils import fix_ASCII_file&lt;br /&gt;
&lt;br /&gt;
xml_fn = 'utric_search_v2.xml'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
First, in order to display results to screen in python, we need to convert the file to plain ASCII (GBIF results contain all many of unusual characters from different languages, and no standardization of slanted quotes and the like; this can cause crashes when attempting to print to screen in python).  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
xml_fn_new = fix_ASCII_file(xml_fn)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This creates a new file with the string &amp;quot;_fixed.xml&amp;quot; added to the filename.&lt;br /&gt;
&lt;br /&gt;
Next, we will parse the XML file into an ElementTree (a python object which contains the data from the XML file as a nested series of lists and dictionaries).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from xml.etree import ElementTree as ET&lt;br /&gt;
xmltree = ET.parse(xml_fn_new)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then store the element tree as nn object of Class GbifXmlTree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gbif_recs_xmltree = GbifXmlTree(xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, with the xmltree stored, we parse it into individual records (stored in individual objects of class GbifObservationRecord), which are then stored as a group in an object of class GbifSearchResults.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs = GbifSearchResults(gbif_recs_xmltree)&lt;br /&gt;
recs.latlongs_to_obj()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The list of individual observation records can be accessed at recs.obs_recs_list:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;print recs.obs_recs_list[0:4], '...'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To get the data for the first individual record:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;rec = recs.obs_recs_list[0]&lt;br /&gt;
&lt;br /&gt;
dir(rec)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
rec.lat will return the latitude, rec.long the longitude, etc.  Certain data attributes are not found in all GBIF records; if they are missing, the field in question will contain &amp;quot;None&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
To print all of the records in a tab-delimited table format:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Checking how many matching records are hosted by GBIF===&lt;br /&gt;
&lt;br /&gt;
Before we go through the trouble of downloading thousands of records, we may wish to know how many there are in GBIF first.  The user must set up a dictionary containing the fields and search terms as keys and items, respectively.  I.e.,&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;from GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
params = {'format': 'darwin', 'scientificname': 'Utricularia'}&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;quot;'format': 'darwin'&amp;quot; specifies that GBIF should return the results in DarwinCore format. 'scientificname' specifies the genus name to search on.  The full list of search terms can be found on GBIF's [http://data.gbif.org/tutorial/services Occurrence record data service], which is linked from the [http://data.gbif.org/tutorial/services Using data from the GBIF portal].&lt;br /&gt;
&lt;br /&gt;
Once you have specified your search parameters, initiate a new GbifSearchResults object and run get_numhits to get the number of hits:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs = GbifSearchResults()&lt;br /&gt;
numhits = recs.get_numhits(params)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
As of August 2009, 1141 matching records existed in GBIF matching &amp;quot;Utricularia.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading an indvidual record===&lt;br /&gt;
&lt;br /&gt;
Individual records can be downloaded by key.  To download an individual record:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
key = 175067484&lt;br /&gt;
recs3 = GbifSearchResults()&lt;br /&gt;
xmlrec = recs3.get_record(key)&lt;br /&gt;
print xmlrec&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Summary statistics for phylogenetic trees with TreeSum===&lt;br /&gt;
&lt;br /&gt;
Biogeographical regions are often characterized by alpha and beta-diversity statistics: basically, these are indices of the number of species found within or between regions.  Given a phylogeny for organisms in a region, phylogenetic alpha- and beta-diversity statistics can be calculated.  This has been implemented in a thorough way in the phylocom package by Webb et al., but for some purposes it is useful to calculate the statistics directly in python.&lt;br /&gt;
&lt;br /&gt;
Here, we need to start with a Newick tree string:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;trstr2 = &amp;quot;(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);&amp;quot;&lt;br /&gt;
&lt;br /&gt;
to2 = Tree(trstr2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, we create a tree summary object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts = TreeSum(to2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The function test_Tree will run the metrics (MPD = Mean Phylogenetic Distance, NRI = Net Relatedness Index, MNPD = Mean Nearest Neighbor Phylogenetic Distance, NTI = Nearest Taxon Index, PD = total Phylogenetic distance) and output to screen:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ts.test_Tree()&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
By subsetting a tree to taxa only existing within a region, statistics can be calculated by region.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-08-17T19:49:36Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Parsing a local (manually downloaded) GBIF DarwinCore XML file */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Tutorial==&lt;br /&gt;
&lt;br /&gt;
Bio.Geography is a module for gathering and processing biogeographical data.  The major motivation for the module is to assist analyses of evolutionary biogeography.  A variety of inference algorithms are available for such analyses, such as [http://www.ebc.uu.se/systzoo/research/diva/manual/dmanual.html DIVA] and [http://code.google.com/p/lagrange/ lagrange].  The inputs to such programs are typically (a) a phylogeny and (b) the areas inhabited by the species at the tips of the phylogeny.  A researcher who has gathered data on a particular group will likely have direct access to species location data, but many large-scale analyses may require gathering large amounts of occurrence data.  Automated gathering/processing of occurrence data has a variety of other applications as well, including species mapping, niche modeling, error-checking of museum records, and monitoring range changes.&lt;br /&gt;
&lt;br /&gt;
Occurrence data is derived mainly from museum collections.  The major source of such data is the [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF).  GBIF serves occurrence data recorded by hundreds of museums worldwide.  GBIF occurrence data can be [http://data.gbif.org/occurrences/ searched manually], and results downloaded (see examples on GBIF website) in various formats: spreadsheet, Google Earth KML, or the XML DarwinCore format.  &lt;br /&gt;
&lt;br /&gt;
GBIF can also be accessed via an API. Bio.Geography can process manually downloaded DarwinCore results, or access GBIF directly.&lt;br /&gt;
&lt;br /&gt;
===Parsing a local (manually downloaded) GBIF DarwinCore XML file===&lt;br /&gt;
&lt;br /&gt;
For one-off uses of GBIF, you may find it easiest to just download occurrence data in spreadsheet format (for analysis) or KML (for mapping).  But for analyses of many groups, or for repeatedly updating an analysis as new data is added to GBIF, automation is desirable.&lt;br /&gt;
&lt;br /&gt;
A manual search conducted on the GBIF website can return results in the form of an XML file adhering to the [http://en.wikipedia.org/wiki/Darwin_Core DarwinCore] data standard.  An example file can be found in biopython's Tests/Geography directory, with the name ''utric_search_v2.xml''.  This file contains over 1000 occurrence records for ''Utricularia'', a genus of carnivorous plant.&lt;br /&gt;
&lt;br /&gt;
Save the utric_search_v2.xml file in your working directory (or download a similar file from GBIF).  Here are suggested steps to parse the file with Bio.Geography's GbifXml module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from Bio.Geography.GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GenUtils import fix_ASCII_file&lt;br /&gt;
&lt;br /&gt;
xml_fn = 'utric_search_v2.xml'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
First, in order to display results to screen in python, we need to convert the file to plain ASCII (GBIF results contain all many of unusual characters from different languages, and no standardization of slanted quotes and the like; this can cause crashes when attempting to print to screen in python).  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
xml_fn_new = fix_ASCII_file(xml_fn)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This creates a new file with the string &amp;quot;_fixed.xml&amp;quot; added to the filename.&lt;br /&gt;
&lt;br /&gt;
Next, we will parse the XML file into an ElementTree (a python object which contains the data from the XML file as a nested series of lists and dictionaries).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
from xml.etree import ElementTree as ET&lt;br /&gt;
xmltree = ET.parse(xml_fn_new)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then store the element tree as nn object of Class GbifXmlTree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;gbif_recs_xmltree = GbifXmlTree(xmltree)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, with the xmltree stored, we parse it into individual records (stored in individual objects of class GbifObservationRecord), which are then stored as a group in an object of class GbifSearchResults.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;recs = GbifSearchResults(gbif_recs_xmltree)&lt;br /&gt;
recs.latlongs_to_obj()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The list of individual observation records can be accessed at recs.obs_recs_list:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;print recs.obs_recs_list[0:4], '...'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To get the data for the first individual record:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;rec = recs.obs_recs_list[0]&lt;br /&gt;
&lt;br /&gt;
dir(rec)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
rec.lat will return the latitude, rec.long the longitude, etc.  Certain data attributes are not found in all GBIF records; if they are missing, the field in question will contain &amp;quot;None&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
To print all of the records in a tab-delimited table format:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
recs.print_records()&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Checking how many matching records are hosted by GBIF===&lt;br /&gt;
&lt;br /&gt;
Before we go through the trouble of downloading thousands of records, we may wish to know how many there are in GBIF first.  The user must set up a dictionary containing the fields and search terms as keys and items, respectively.  I.e.,&lt;br /&gt;
&lt;br /&gt;
from GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
params = {'format': 'darwin', 'scientificname': 'Utricularia'}&lt;br /&gt;
&lt;br /&gt;
&amp;quot;'format': 'darwin'&amp;quot; specifies that GBIF should return the results in DarwinCore format. 'scientificname' specifies the genus name to search on.  The full list of search terms can be found on GBIF's [http://data.gbif.org/tutorial/services Occurrence record data service], which is linked from the [http://data.gbif.org/tutorial/services Using data from the GBIF portal].&lt;br /&gt;
&lt;br /&gt;
Once you have specified your search parameters, initiate a new GbifSearchResults object and run get_numhits to get the number of hits:&lt;br /&gt;
&lt;br /&gt;
recs = GbifSearchResults()&lt;br /&gt;
numhits = recs.get_numhits(params)&lt;br /&gt;
&lt;br /&gt;
As of August 2009, 1141 matching records existed in GBIF matching &amp;quot;Utricularia.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading an indvidual record===&lt;br /&gt;
&lt;br /&gt;
Individual records can be downloaded by key.  To download an individual record:&lt;br /&gt;
&lt;br /&gt;
key = 175067484&lt;br /&gt;
recs3 = GbifSearchResults()&lt;br /&gt;
xmlrec = recs3.get_record(key)&lt;br /&gt;
print xmlrec&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Summary statistics for phylogenetic trees with TreeSum===&lt;br /&gt;
&lt;br /&gt;
Biogeographical regions are often characterized by alpha and beta-diversity statistics: basically, these are indices of the number of species found within or between regions.  Given a phylogeny for organisms in a region, phylogenetic alpha- and beta-diversity statistics can be calculated.  This has been implemented in a thorough way in the phylocom package by Webb et al., but for some purposes it is useful to calculate the statistics directly in python.&lt;br /&gt;
&lt;br /&gt;
Here, we need to start with a Newick tree string:&lt;br /&gt;
&lt;br /&gt;
trstr2 = &amp;quot;(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);&amp;quot;&lt;br /&gt;
&lt;br /&gt;
to2 = Tree(trstr2) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, we create a tree summary object:&lt;br /&gt;
&lt;br /&gt;
ts = TreeSum(to2)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The function test_Tree will run the metrics (MPD = Mean Phylogenetic Distance, NRI = Net Relatedness Index, MNPD = Mean Nearest Neighbor Phylogenetic Distance, NTI = Nearest Taxon Index, PD = total Phylogenetic distance) and output to screen:&lt;br /&gt;
&lt;br /&gt;
ts.test_Tree()&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
By subsetting a tree to taxa only existing within a region, statistics can be calculated by region.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-08-17T19:48:07Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Parsing a local (manually downloaded) GBIF DarwinCore XML file */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Tutorial==&lt;br /&gt;
&lt;br /&gt;
Bio.Geography is a module for gathering and processing biogeographical data.  The major motivation for the module is to assist analyses of evolutionary biogeography.  A variety of inference algorithms are available for such analyses, such as [http://www.ebc.uu.se/systzoo/research/diva/manual/dmanual.html DIVA] and [http://code.google.com/p/lagrange/ lagrange].  The inputs to such programs are typically (a) a phylogeny and (b) the areas inhabited by the species at the tips of the phylogeny.  A researcher who has gathered data on a particular group will likely have direct access to species location data, but many large-scale analyses may require gathering large amounts of occurrence data.  Automated gathering/processing of occurrence data has a variety of other applications as well, including species mapping, niche modeling, error-checking of museum records, and monitoring range changes.&lt;br /&gt;
&lt;br /&gt;
Occurrence data is derived mainly from museum collections.  The major source of such data is the [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF).  GBIF serves occurrence data recorded by hundreds of museums worldwide.  GBIF occurrence data can be [http://data.gbif.org/occurrences/ searched manually], and results downloaded (see examples on GBIF website) in various formats: spreadsheet, Google Earth KML, or the XML DarwinCore format.  &lt;br /&gt;
&lt;br /&gt;
GBIF can also be accessed via an API. Bio.Geography can process manually downloaded DarwinCore results, or access GBIF directly.&lt;br /&gt;
&lt;br /&gt;
===Parsing a local (manually downloaded) GBIF DarwinCore XML file===&lt;br /&gt;
&lt;br /&gt;
For one-off uses of GBIF, you may find it easiest to just download occurrence data in spreadsheet format (for analysis) or KML (for mapping).  But for analyses of many groups, or for repeatedly updating an analysis as new data is added to GBIF, automation is desirable.&lt;br /&gt;
&lt;br /&gt;
A manual search conducted on the GBIF website can return results in the form of an XML file adhering to the [http://en.wikipedia.org/wiki/Darwin_Core DarwinCore] data standard.  An example file can be found in biopython's Tests/Geography directory, with the name ''utric_search_v2.xml''.  This file contains over 1000 occurrence records for ''Utricularia'', a genus of carnivorous plant.&lt;br /&gt;
&lt;br /&gt;
Save the utric_search_v2.xml file in your working directory (or download a similar file from GBIF).  Here are suggested steps to parse the file with Bio.Geography's GbifXml module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;nowiki&amp;gt;&lt;br /&gt;
from Bio.Geography.GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GenUtils import fix_ASCII_file&lt;br /&gt;
&lt;br /&gt;
xml_fn = 'utric_search_v2.xml'&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
First, in order to display results to screen in python, we need to convert the file to plain ASCII (GBIF results contain all many of unusual characters from different languages, and no standardization of slanted quotes and the like; this can cause crashes when attempting to print to screen in python).  &lt;br /&gt;
&lt;br /&gt;
xml_fn_new = fix_ASCII_file(xml_fn)&lt;br /&gt;
&lt;br /&gt;
This creates a new file with the string &amp;quot;_fixed.xml&amp;quot; added to the filename.&lt;br /&gt;
&lt;br /&gt;
Next, we will parse the XML file into an ElementTree (a python object which contains the data from the XML file as a nested series of lists and dictionaries).&lt;br /&gt;
&lt;br /&gt;
from xml.etree import ElementTree as ET&lt;br /&gt;
xmltree = ET.parse(xml_fn_new)&lt;br /&gt;
&lt;br /&gt;
We can then store the element tree as nn object of Class GbifXmlTree:&lt;br /&gt;
gbif_recs_xmltree = GbifXmlTree(xmltree)&lt;br /&gt;
&lt;br /&gt;
Then, with the xmltree stored, we parse it into individual records (stored in individual objects of class GbifObservationRecord), which are then stored as a group in an object of class GbifSearchResults.&lt;br /&gt;
&lt;br /&gt;
recs = GbifSearchResults(gbif_recs_xmltree)&lt;br /&gt;
recs.latlongs_to_obj()&lt;br /&gt;
&lt;br /&gt;
The list of individual observation records can be accessed at recs.obs_recs_list:&lt;br /&gt;
&lt;br /&gt;
print recs.obs_recs_list[0:4], '...'&lt;br /&gt;
&lt;br /&gt;
To get the data for the first individual record:&lt;br /&gt;
&lt;br /&gt;
rec = recs.obs_recs_list[0]&lt;br /&gt;
&lt;br /&gt;
dir(rec)&lt;br /&gt;
&lt;br /&gt;
rec.lat will return the latitude, rec.long the longitude, etc.  Certain data attributes are not found in all GBIF records; if they are missing, the field in question will contain &amp;quot;None&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
To print all of the records in a tab-delimited table format:&lt;br /&gt;
&lt;br /&gt;
recs.print_records()&lt;br /&gt;
&lt;br /&gt;
===Checking how many matching records are hosted by GBIF===&lt;br /&gt;
&lt;br /&gt;
Before we go through the trouble of downloading thousands of records, we may wish to know how many there are in GBIF first.  The user must set up a dictionary containing the fields and search terms as keys and items, respectively.  I.e.,&lt;br /&gt;
&lt;br /&gt;
from GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
params = {'format': 'darwin', 'scientificname': 'Utricularia'}&lt;br /&gt;
&lt;br /&gt;
&amp;quot;'format': 'darwin'&amp;quot; specifies that GBIF should return the results in DarwinCore format. 'scientificname' specifies the genus name to search on.  The full list of search terms can be found on GBIF's [http://data.gbif.org/tutorial/services Occurrence record data service], which is linked from the [http://data.gbif.org/tutorial/services Using data from the GBIF portal].&lt;br /&gt;
&lt;br /&gt;
Once you have specified your search parameters, initiate a new GbifSearchResults object and run get_numhits to get the number of hits:&lt;br /&gt;
&lt;br /&gt;
recs = GbifSearchResults()&lt;br /&gt;
numhits = recs.get_numhits(params)&lt;br /&gt;
&lt;br /&gt;
As of August 2009, 1141 matching records existed in GBIF matching &amp;quot;Utricularia.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading an indvidual record===&lt;br /&gt;
&lt;br /&gt;
Individual records can be downloaded by key.  To download an individual record:&lt;br /&gt;
&lt;br /&gt;
key = 175067484&lt;br /&gt;
recs3 = GbifSearchResults()&lt;br /&gt;
xmlrec = recs3.get_record(key)&lt;br /&gt;
print xmlrec&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Summary statistics for phylogenetic trees with TreeSum===&lt;br /&gt;
&lt;br /&gt;
Biogeographical regions are often characterized by alpha and beta-diversity statistics: basically, these are indices of the number of species found within or between regions.  Given a phylogeny for organisms in a region, phylogenetic alpha- and beta-diversity statistics can be calculated.  This has been implemented in a thorough way in the phylocom package by Webb et al., but for some purposes it is useful to calculate the statistics directly in python.&lt;br /&gt;
&lt;br /&gt;
Here, we need to start with a Newick tree string:&lt;br /&gt;
&lt;br /&gt;
trstr2 = &amp;quot;(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);&amp;quot;&lt;br /&gt;
&lt;br /&gt;
to2 = Tree(trstr2) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, we create a tree summary object:&lt;br /&gt;
&lt;br /&gt;
ts = TreeSum(to2)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The function test_Tree will run the metrics (MPD = Mean Phylogenetic Distance, NRI = Net Relatedness Index, MNPD = Mean Nearest Neighbor Phylogenetic Distance, NTI = Nearest Taxon Index, PD = total Phylogenetic distance) and output to screen:&lt;br /&gt;
&lt;br /&gt;
ts.test_Tree()&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
By subsetting a tree to taxa only existing within a region, statistics can be calculated by region.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-08-17T19:47:02Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Tutorial */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Tutorial==&lt;br /&gt;
&lt;br /&gt;
Bio.Geography is a module for gathering and processing biogeographical data.  The major motivation for the module is to assist analyses of evolutionary biogeography.  A variety of inference algorithms are available for such analyses, such as [http://www.ebc.uu.se/systzoo/research/diva/manual/dmanual.html DIVA] and [http://code.google.com/p/lagrange/ lagrange].  The inputs to such programs are typically (a) a phylogeny and (b) the areas inhabited by the species at the tips of the phylogeny.  A researcher who has gathered data on a particular group will likely have direct access to species location data, but many large-scale analyses may require gathering large amounts of occurrence data.  Automated gathering/processing of occurrence data has a variety of other applications as well, including species mapping, niche modeling, error-checking of museum records, and monitoring range changes.&lt;br /&gt;
&lt;br /&gt;
Occurrence data is derived mainly from museum collections.  The major source of such data is the [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF).  GBIF serves occurrence data recorded by hundreds of museums worldwide.  GBIF occurrence data can be [http://data.gbif.org/occurrences/ searched manually], and results downloaded (see examples on GBIF website) in various formats: spreadsheet, Google Earth KML, or the XML DarwinCore format.  &lt;br /&gt;
&lt;br /&gt;
GBIF can also be accessed via an API. Bio.Geography can process manually downloaded DarwinCore results, or access GBIF directly.&lt;br /&gt;
&lt;br /&gt;
===Parsing a local (manually downloaded) GBIF DarwinCore XML file===&lt;br /&gt;
&lt;br /&gt;
For one-off uses of GBIF, you may find it easiest to just download occurrence data in spreadsheet format (for analysis) or KML (for mapping).  But for analyses of many groups, or for repeatedly updating an analysis as new data is added to GBIF, automation is desirable.&lt;br /&gt;
&lt;br /&gt;
A manual search conducted on the GBIF website can return results in the form of an XML file adhering to the [http://en.wikipedia.org/wiki/Darwin_Core DarwinCore] data standard.  An example file can be found in biopython's Tests/Geography directory, with the name ''utric_search_v2.xml''.  This file contains over 1000 occurrence records for ''Utricularia'', a genus of carnivorous plant.&lt;br /&gt;
&lt;br /&gt;
Save the utric_search_v2.xml file in your working directory (or download a similar file from GBIF).  Here are suggested steps to parse the file with Bio.Geography's GbifXml module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=python&amp;gt;&lt;br /&gt;
from Bio.Geography.GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GenUtils import fix_ASCII_file&lt;br /&gt;
&lt;br /&gt;
xml_fn = 'utric_search_v2.xml'&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
First, in order to display results to screen in python, we need to convert the file to plain ASCII (GBIF results contain all many of unusual characters from different languages, and no standardization of slanted quotes and the like; this can cause crashes when attempting to print to screen in python).  &lt;br /&gt;
&lt;br /&gt;
xml_fn_new = fix_ASCII_file(xml_fn)&lt;br /&gt;
&lt;br /&gt;
This creates a new file with the string &amp;quot;_fixed.xml&amp;quot; added to the filename.&lt;br /&gt;
&lt;br /&gt;
Next, we will parse the XML file into an ElementTree (a python object which contains the data from the XML file as a nested series of lists and dictionaries).&lt;br /&gt;
&lt;br /&gt;
from xml.etree import ElementTree as ET&lt;br /&gt;
xmltree = ET.parse(xml_fn_new)&lt;br /&gt;
&lt;br /&gt;
We can then store the element tree as nn object of Class GbifXmlTree:&lt;br /&gt;
gbif_recs_xmltree = GbifXmlTree(xmltree)&lt;br /&gt;
&lt;br /&gt;
Then, with the xmltree stored, we parse it into individual records (stored in individual objects of class GbifObservationRecord), which are then stored as a group in an object of class GbifSearchResults.&lt;br /&gt;
&lt;br /&gt;
recs = GbifSearchResults(gbif_recs_xmltree)&lt;br /&gt;
recs.latlongs_to_obj()&lt;br /&gt;
&lt;br /&gt;
The list of individual observation records can be accessed at recs.obs_recs_list:&lt;br /&gt;
&lt;br /&gt;
print recs.obs_recs_list[0:4], '...'&lt;br /&gt;
&lt;br /&gt;
To get the data for the first individual record:&lt;br /&gt;
&lt;br /&gt;
rec = recs.obs_recs_list[0]&lt;br /&gt;
&lt;br /&gt;
dir(rec)&lt;br /&gt;
&lt;br /&gt;
rec.lat will return the latitude, rec.long the longitude, etc.  Certain data attributes are not found in all GBIF records; if they are missing, the field in question will contain &amp;quot;None&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
To print all of the records in a tab-delimited table format:&lt;br /&gt;
&lt;br /&gt;
recs.print_records()&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Checking how many matching records are hosted by GBIF===&lt;br /&gt;
&lt;br /&gt;
Before we go through the trouble of downloading thousands of records, we may wish to know how many there are in GBIF first.  The user must set up a dictionary containing the fields and search terms as keys and items, respectively.  I.e.,&lt;br /&gt;
&lt;br /&gt;
from GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
params = {'format': 'darwin', 'scientificname': 'Utricularia'}&lt;br /&gt;
&lt;br /&gt;
&amp;quot;'format': 'darwin'&amp;quot; specifies that GBIF should return the results in DarwinCore format. 'scientificname' specifies the genus name to search on.  The full list of search terms can be found on GBIF's [http://data.gbif.org/tutorial/services Occurrence record data service], which is linked from the [http://data.gbif.org/tutorial/services Using data from the GBIF portal].&lt;br /&gt;
&lt;br /&gt;
Once you have specified your search parameters, initiate a new GbifSearchResults object and run get_numhits to get the number of hits:&lt;br /&gt;
&lt;br /&gt;
recs = GbifSearchResults()&lt;br /&gt;
numhits = recs.get_numhits(params)&lt;br /&gt;
&lt;br /&gt;
As of August 2009, 1141 matching records existed in GBIF matching &amp;quot;Utricularia.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading an indvidual record===&lt;br /&gt;
&lt;br /&gt;
Individual records can be downloaded by key.  To download an individual record:&lt;br /&gt;
&lt;br /&gt;
key = 175067484&lt;br /&gt;
recs3 = GbifSearchResults()&lt;br /&gt;
xmlrec = recs3.get_record(key)&lt;br /&gt;
print xmlrec&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Summary statistics for phylogenetic trees with TreeSum===&lt;br /&gt;
&lt;br /&gt;
Biogeographical regions are often characterized by alpha and beta-diversity statistics: basically, these are indices of the number of species found within or between regions.  Given a phylogeny for organisms in a region, phylogenetic alpha- and beta-diversity statistics can be calculated.  This has been implemented in a thorough way in the phylocom package by Webb et al., but for some purposes it is useful to calculate the statistics directly in python.&lt;br /&gt;
&lt;br /&gt;
Here, we need to start with a Newick tree string:&lt;br /&gt;
&lt;br /&gt;
trstr2 = &amp;quot;(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);&amp;quot;&lt;br /&gt;
&lt;br /&gt;
to2 = Tree(trstr2) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, we create a tree summary object:&lt;br /&gt;
&lt;br /&gt;
ts = TreeSum(to2)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The function test_Tree will run the metrics (MPD = Mean Phylogenetic Distance, NRI = Net Relatedness Index, MNPD = Mean Nearest Neighbor Phylogenetic Distance, NTI = Nearest Taxon Index, PD = total Phylogenetic distance) and output to screen:&lt;br /&gt;
&lt;br /&gt;
ts.test_Tree()&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
By subsetting a tree to taxa only existing within a region, statistics can be calculated by region.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-08-17T19:38:46Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: added simple Tutorial&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Summary of functions==&lt;br /&gt;
&lt;br /&gt;
==Tutorial==&lt;br /&gt;
&lt;br /&gt;
Bio.Geography is a module for gathering and processing biogeographical data.  The major motivation for the module is to assist analyses of evolutionary biogeography.  A variety of inference algorithms are available for such analyses, such as [http://www.ebc.uu.se/systzoo/research/diva/manual/dmanual.html DIVA] and [http://code.google.com/p/lagrange/ lagrange].  The inputs to such programs are typically (a) a phylogeny and (b) the areas inhabited by the species at the tips of the phylogeny.  A researcher who has gathered data on a particular group will likely have direct access to species location data, but many large-scale analyses may require gathering large amounts of occurrence data.  Automated gathering/processing of occurrence data has a variety of other applications as well, including species mapping, niche modeling, error-checking of museum records, and monitoring range changes.&lt;br /&gt;
&lt;br /&gt;
Occurrence data is derived mainly from museum collections.  The major source of such data is the [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF).  GBIF serves occurrence data recorded by hundreds of museums worldwide.  GBIF occurrence data can be [http://data.gbif.org/occurrences/ searched manually], and results downloaded ([http://data.gbif.org/occurrences/search.htm?c[0].s=0&amp;amp;c[0].p=0&amp;amp;c[0].o=Strix+aluco&amp;amp;c[1].s=5&amp;amp;c[1].p=0&amp;amp;c[1].o=PL&amp;amp;c[2].s=17&amp;amp;c[2].p=0&amp;amp;c[2].o=1&amp;amp;c[3].s=17&amp;amp;c[3].p=0&amp;amp;c[3].o=1&amp;amp;c[4].s=29&amp;amp;c[4].p=0&amp;amp;c[4].o=0 example on GBIF website]) in various formats: spreadsheet, Google Earth KML, or the XML DarwinCore format.  &lt;br /&gt;
&lt;br /&gt;
GBIF can also be accessed via an API. Bio.Geography can process manually downloaded DarwinCore results, or access GBIF directly.&lt;br /&gt;
&lt;br /&gt;
===Parsing a local (manually downloaded) GBIF DarwinCore XML file===&lt;br /&gt;
&lt;br /&gt;
For one-off uses of GBIF, you may find it easiest to just download occurrence data in spreadsheet format (for analysis) or KML (for mapping).  But for analyses of many groups, or for repeatedly updating an analysis as new data is added to GBIF, automation is desirable.&lt;br /&gt;
&lt;br /&gt;
A manual search conducted on the GBIF website can return results in the form of an XML file adhering to the [http://en.wikipedia.org/wiki/Darwin_Core DarwinCore] data standard.  An example file can be found in biopython's Tests/Geography directory, with the name ''utric_search_v2.xml''.  This file contains over 1000 occurrence records for ''Utricularia'', a genus of carnivorous plant.&lt;br /&gt;
&lt;br /&gt;
Save the utric_search_v2.xml file in your working directory (or download a similar file from GBIF).  Here are suggested steps to parse the file with Bio.Geography's GbifXml module:&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
&lt;br /&gt;
from Bio.Geography.GenUtils import fix_ASCII_file&lt;br /&gt;
&lt;br /&gt;
xml_fn = 'utric_search_v2.xml'&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
First, in order to display results to screen in python, we need to convert the file to plain ASCII (GBIF results contain all many of unusual characters from different languages, and no standardization of slanted quotes and the like; this can cause crashes when attempting to print to screen in python).  &lt;br /&gt;
&lt;br /&gt;
xml_fn_new = fix_ASCII_file(xml_fn)&lt;br /&gt;
&lt;br /&gt;
This creates a new file with the string &amp;quot;_fixed.xml&amp;quot; added to the filename.&lt;br /&gt;
&lt;br /&gt;
Next, we will parse the XML file into an ElementTree (a python object which contains the data from the XML file as a nested series of lists and dictionaries).&lt;br /&gt;
&lt;br /&gt;
from xml.etree import ElementTree as ET&lt;br /&gt;
xmltree = ET.parse(xml_fn_new)&lt;br /&gt;
&lt;br /&gt;
We can then store the element tree as nn object of Class GbifXmlTree:&lt;br /&gt;
gbif_recs_xmltree = GbifXmlTree(xmltree)&lt;br /&gt;
&lt;br /&gt;
Then, with the xmltree stored, we parse it into individual records (stored in individual objects of class GbifObservationRecord), which are then stored as a group in an object of class GbifSearchResults.&lt;br /&gt;
&lt;br /&gt;
recs = GbifSearchResults(gbif_recs_xmltree)&lt;br /&gt;
recs.latlongs_to_obj()&lt;br /&gt;
&lt;br /&gt;
The list of individual observation records can be accessed at recs.obs_recs_list:&lt;br /&gt;
&lt;br /&gt;
print recs.obs_recs_list[0:4], '...'&lt;br /&gt;
&lt;br /&gt;
To get the data for the first individual record:&lt;br /&gt;
&lt;br /&gt;
rec = recs.obs_recs_list[0]&lt;br /&gt;
&lt;br /&gt;
dir(rec)&lt;br /&gt;
&lt;br /&gt;
rec.lat will return the latitude, rec.long the longitude, etc.  Certain data attributes are not found in all GBIF records; if they are missing, the field in question will contain &amp;quot;None&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
To print all of the records in a tab-delimited table format:&lt;br /&gt;
&lt;br /&gt;
recs.print_records()&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Checking how many matching records are hosted by GBIF===&lt;br /&gt;
&lt;br /&gt;
Before we go through the trouble of downloading thousands of records, we may wish to know how many there are in GBIF first.  The user must set up a dictionary containing the fields and search terms as keys and items, respectively.  I.e.,&lt;br /&gt;
&lt;br /&gt;
from GbifXml import GbifXmlTree, GbifSearchResults&lt;br /&gt;
params = {'format': 'darwin', 'scientificname': 'Utricularia'}&lt;br /&gt;
&lt;br /&gt;
&amp;quot;'format': 'darwin'&amp;quot; specifies that GBIF should return the results in DarwinCore format. 'scientificname' specifies the genus name to search on.  The full list of search terms can be found on GBIF's [http://data.gbif.org/tutorial/services Occurrence record data service], which is linked from the [http://data.gbif.org/tutorial/services Using data from the GBIF portal].&lt;br /&gt;
&lt;br /&gt;
Once you have specified your search parameters, initiate a new GbifSearchResults object and run get_numhits to get the number of hits:&lt;br /&gt;
&lt;br /&gt;
recs = GbifSearchResults()&lt;br /&gt;
numhits = recs.get_numhits(params)&lt;br /&gt;
&lt;br /&gt;
As of August 2009, 1141 matching records existed in GBIF matching &amp;quot;Utricularia.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading an indvidual record===&lt;br /&gt;
&lt;br /&gt;
Individual records can be downloaded by key.  To download an individual record:&lt;br /&gt;
&lt;br /&gt;
key = 175067484&lt;br /&gt;
recs3 = GbifSearchResults()&lt;br /&gt;
xmlrec = recs3.get_record(key)&lt;br /&gt;
print xmlrec&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Summary statistics for phylogenetic trees with TreeSum===&lt;br /&gt;
&lt;br /&gt;
Biogeographical regions are often characterized by alpha and beta-diversity statistics: basically, these are indices of the number of species found within or between regions.  Given a phylogeny for organisms in a region, phylogenetic alpha- and beta-diversity statistics can be calculated.  This has been implemented in a thorough way in the phylocom package by Webb et al., but for some purposes it is useful to calculate the statistics directly in python.&lt;br /&gt;
&lt;br /&gt;
Here, we need to start with a Newick tree string:&lt;br /&gt;
&lt;br /&gt;
trstr2 = &amp;quot;(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);&amp;quot;&lt;br /&gt;
&lt;br /&gt;
to2 = Tree(trstr2) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Then, we create a tree summary object:&lt;br /&gt;
&lt;br /&gt;
ts = TreeSum(to2)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The function test_Tree will run the metrics (MPD = Mean Phylogenetic Distance, NRI = Net Relatedness Index, MNPD = Mean Nearest Neighbor Phylogenetic Distance, NTI = Nearest Taxon Index, PD = total Phylogenetic distance) and output to screen:&lt;br /&gt;
&lt;br /&gt;
ts.test_Tree()&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
By subsetting a tree to taxa only existing within a region, statistics can be calculated by region.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-07-02T18:56:01Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: xmltree tools code&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
====Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
&lt;br /&gt;
Added functions download &amp;amp; parse large numbers of records, get TaxonOccurrence gbifKeys, and search with those keys.&lt;br /&gt;
&lt;br /&gt;
====get_record====&lt;br /&gt;
Retrieves a single specified record in DarwinCore XML format, and returns an xmltree for it.&lt;br /&gt;
&lt;br /&gt;
====extract_occurrence_elements====&lt;br /&gt;
Returns a list of the elements, picking elements by TaxonOccurrence; this should return a list of elements equal to the number of hits.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tolist====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns list.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tofile====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns file at outfh.&lt;br /&gt;
&lt;br /&gt;
====get_all_records_by_increment====&lt;br /&gt;
Download all of the records in stages, store in list of elements. Increments of e.g. 100 to not overload server.  Currently stores results in a list of tempfiles which is returned (could return a list of handles I guess).&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
&lt;br /&gt;
Updated functions have been pushed to Github [http://github.com/nmatzke/biopython/commit/5df9025ea5cd3458915db982c69422345e1da8d7 here]&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick(newickstr)====&lt;br /&gt;
Read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any. &lt;br /&gt;
&lt;br /&gt;
====list_leaves(phylo_obj)====&lt;br /&gt;
Print out all of the leaves in above a node object&lt;br /&gt;
&lt;br /&gt;
====treelength(node)====&lt;br /&gt;
Gets the total branchlength above a given node by recursively adding through tree.&lt;br /&gt;
&lt;br /&gt;
====phylodistance(node1, node2)====&lt;br /&gt;
Get the phylogenetic distance (branch length) between two nodes.&lt;br /&gt;
&lt;br /&gt;
====get_distance_matrix(phylo_obj)====&lt;br /&gt;
Get a matrix of all of the pairwise distances between the tips of a tree.&lt;br /&gt;
&lt;br /&gt;
====get_mrca_array(phylo_obj)====&lt;br /&gt;
Get a square list of lists (array) listing the mrca of each pair of leaves (half-diagonal matrix)&lt;br /&gt;
&lt;br /&gt;
====subset_tree(phylo_obj, list_to_keep)====&lt;br /&gt;
Given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree.&lt;br /&gt;
&lt;br /&gt;
====prune_single_desc_nodes(node)====&lt;br /&gt;
Follow a tree from the bottom up, pruning any nodes with only one descendant&lt;br /&gt;
====find_new_root(node)====&lt;br /&gt;
Search up tree from root and make new root at first divergence&lt;br /&gt;
&lt;br /&gt;
====make_None_list_array(xdim, ydim)====&lt;br /&gt;
Make a list of lists (&amp;quot;array&amp;quot;) with the specified dimensions	&lt;br /&gt;
&lt;br /&gt;
====get_PD_to_mrca(node, mrca, PD)====&lt;br /&gt;
Add up the phylogenetic distance from a node to the specified ancestor (mrca).  Find mrca with find_1st_match.&lt;br /&gt;
&lt;br /&gt;
====find_1st_match(list1, list2)====&lt;br /&gt;
Find the first match in two ordered lists.&lt;br /&gt;
&lt;br /&gt;
====get_ancestors_list(node, anc_list)====&lt;br /&gt;
Get the list of ancestors of a given node&lt;br /&gt;
&lt;br /&gt;
====addup_PD(node, PD)====&lt;br /&gt;
Adds the branchlength of the current node to the total PD measure.&lt;br /&gt;
&lt;br /&gt;
====print_tree_outline_format(phylo_obj)====&lt;br /&gt;
Prints the tree out in &amp;quot;outline&amp;quot; format (daughter clades are indented, etc.)&lt;br /&gt;
&lt;br /&gt;
====print_Node(node, rank)====&lt;br /&gt;
Prints the node in question, and recursively all daughter nodes, maintaining rank as it goes.&lt;br /&gt;
&lt;br /&gt;
====lagrange_disclaimer()====&lt;br /&gt;
Just prints lagrange citation etc. in code using lagrange libraries.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
Priority for this week:&lt;br /&gt;
&lt;br /&gt;
Following up on suggestions to make the code more standard, with the priority of figuring out how I can revise the current BioPython phylogeny class, to resemble the better version in lagrange, so that there is a generic flexible phylogeny/newick parser that can be used generally as well as by my BioGeography package specifically.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Added a bunch of tools for managing/parsing xmltree structures from ElementTree parsing of XML:&lt;br /&gt;
&lt;br /&gt;
====find_to_elements_w_ancs(xmltree, el_tag, anc_el_tag)====&lt;br /&gt;
Burrow into XML to get an element with tag el_tag, return only those el_tags underneath a particular parent element parent_el_tag&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====create_sub_xmltree(element)====&lt;br /&gt;
Create a subset xmltree (to avoid going back to irrelevant parents)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====xml_recursive_search_w_anc(xmltree, element, el_tag, anc_el_tag, match_el_list)====&lt;br /&gt;
Recursively burrows down to find whatever elements with el_tag exist inside a parent_el_tag.&lt;br /&gt;
		&lt;br /&gt;
&lt;br /&gt;
====xml_burrow_up(xmltree, element, anc_el_tag, found_anc)====&lt;br /&gt;
Burrow up xml to find anc_el_tag&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====xml_burrow_up_cousin(xmltree, element, cousin_el_tag, found_cousin)====&lt;br /&gt;
Burrow up from element of interest, until a cousin is found with cousin_el_tag&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====return_parent_in_xmltree(xmltree, child_to_search_for)====&lt;br /&gt;
Search through an xmltree to get the parent of child_to_search_for&lt;br /&gt;
				&lt;br /&gt;
&lt;br /&gt;
====return_parent_in_element(potential_parent, child_to_search_for, returned_parent)====&lt;br /&gt;
Search through an XML element to return parent of child_to_search_for&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====find_1st_matching_element(element, el_tag, return_element)====&lt;br /&gt;
Burrow down into the XML tree, retrieve the first element with the matching tag&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====element_items_to_string(items)====&lt;br /&gt;
Input a list of items, get string back&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here]...no example script yet.&lt;br /&gt;
&lt;br /&gt;
These still need to be integrated:&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-07-02T18:54:14Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: june wk4 update&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
====Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
&lt;br /&gt;
Added functions download &amp;amp; parse large numbers of records, get TaxonOccurrence gbifKeys, and search with those keys.&lt;br /&gt;
&lt;br /&gt;
====get_record====&lt;br /&gt;
Retrieves a single specified record in DarwinCore XML format, and returns an xmltree for it.&lt;br /&gt;
&lt;br /&gt;
====extract_occurrence_elements====&lt;br /&gt;
Returns a list of the elements, picking elements by TaxonOccurrence; this should return a list of elements equal to the number of hits.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tolist====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns list.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tofile====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns file at outfh.&lt;br /&gt;
&lt;br /&gt;
====get_all_records_by_increment====&lt;br /&gt;
Download all of the records in stages, store in list of elements. Increments of e.g. 100 to not overload server.  Currently stores results in a list of tempfiles which is returned (could return a list of handles I guess).&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
&lt;br /&gt;
Updated functions have been pushed to Github [http://github.com/nmatzke/biopython/commit/5df9025ea5cd3458915db982c69422345e1da8d7 here]&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick(newickstr)====&lt;br /&gt;
Read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any. &lt;br /&gt;
&lt;br /&gt;
====list_leaves(phylo_obj)====&lt;br /&gt;
Print out all of the leaves in above a node object&lt;br /&gt;
&lt;br /&gt;
====treelength(node)====&lt;br /&gt;
Gets the total branchlength above a given node by recursively adding through tree.&lt;br /&gt;
&lt;br /&gt;
====phylodistance(node1, node2)====&lt;br /&gt;
Get the phylogenetic distance (branch length) between two nodes.&lt;br /&gt;
&lt;br /&gt;
====get_distance_matrix(phylo_obj)====&lt;br /&gt;
Get a matrix of all of the pairwise distances between the tips of a tree.&lt;br /&gt;
&lt;br /&gt;
====get_mrca_array(phylo_obj)====&lt;br /&gt;
Get a square list of lists (array) listing the mrca of each pair of leaves (half-diagonal matrix)&lt;br /&gt;
&lt;br /&gt;
====subset_tree(phylo_obj, list_to_keep)====&lt;br /&gt;
Given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree.&lt;br /&gt;
&lt;br /&gt;
====prune_single_desc_nodes(node)====&lt;br /&gt;
Follow a tree from the bottom up, pruning any nodes with only one descendant&lt;br /&gt;
====find_new_root(node)====&lt;br /&gt;
Search up tree from root and make new root at first divergence&lt;br /&gt;
&lt;br /&gt;
====make_None_list_array(xdim, ydim)====&lt;br /&gt;
Make a list of lists (&amp;quot;array&amp;quot;) with the specified dimensions	&lt;br /&gt;
&lt;br /&gt;
====get_PD_to_mrca(node, mrca, PD)====&lt;br /&gt;
Add up the phylogenetic distance from a node to the specified ancestor (mrca).  Find mrca with find_1st_match.&lt;br /&gt;
&lt;br /&gt;
====find_1st_match(list1, list2)====&lt;br /&gt;
Find the first match in two ordered lists.&lt;br /&gt;
&lt;br /&gt;
====get_ancestors_list(node, anc_list)====&lt;br /&gt;
Get the list of ancestors of a given node&lt;br /&gt;
&lt;br /&gt;
====addup_PD(node, PD)====&lt;br /&gt;
Adds the branchlength of the current node to the total PD measure.&lt;br /&gt;
&lt;br /&gt;
====print_tree_outline_format(phylo_obj)====&lt;br /&gt;
Prints the tree out in &amp;quot;outline&amp;quot; format (daughter clades are indented, etc.)&lt;br /&gt;
&lt;br /&gt;
====print_Node(node, rank)====&lt;br /&gt;
Prints the node in question, and recursively all daughter nodes, maintaining rank as it goes.&lt;br /&gt;
&lt;br /&gt;
====lagrange_disclaimer()====&lt;br /&gt;
Just prints lagrange citation etc. in code using lagrange libraries.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
Priority for this week:&lt;br /&gt;
&lt;br /&gt;
Following up on suggestions to make the code more standard, with the priority of figuring out how I can revise the current BioPython phylogeny class, to resemble the better version in lagrange, so that there is a generic flexible phylogeny/newick parser that can be used generally as well as by my BioGeography package specifically.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Added a bunch of tools for managing/parsing xmltree structures from ElementTree parsing of XML:&lt;br /&gt;
&lt;br /&gt;
====find_to_elements_w_ancs(xmltree, el_tag, anc_el_tag)====&lt;br /&gt;
Burrow into XML to get an element with tag el_tag, return only those el_tags underneath a particular parent element parent_el_tag&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====create_sub_xmltree(element)====&lt;br /&gt;
Create a subset xmltree (to avoid going back to irrelevant parents)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====xml_recursive_search_w_anc(xmltree, element, el_tag, anc_el_tag, match_el_list)====&lt;br /&gt;
Recursively burrows down to find whatever elements with el_tag exist inside a parent_el_tag.&lt;br /&gt;
		&lt;br /&gt;
&lt;br /&gt;
====xml_burrow_up(xmltree, element, anc_el_tag, found_anc)====&lt;br /&gt;
Burrow up xml to find anc_el_tag&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====xml_burrow_up_cousin(xmltree, element, cousin_el_tag, found_cousin)====&lt;br /&gt;
Burrow up from element of interest, until a cousin is found with cousin_el_tag&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====return_parent_in_xmltree(xmltree, child_to_search_for)====&lt;br /&gt;
Search through an xmltree to get the parent of child_to_search_for&lt;br /&gt;
				&lt;br /&gt;
&lt;br /&gt;
====return_parent_in_element(potential_parent, child_to_search_for, returned_parent)====&lt;br /&gt;
Search through an XML element to return parent of child_to_search_for&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====find_1st_matching_element(element, el_tag, return_element)====&lt;br /&gt;
Burrow down into the XML tree, retrieve the first element with the matching tag&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====element_items_to_string(items)====&lt;br /&gt;
Input a list of items, get string back&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
These still need to be integrated:&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-24T04:47:46Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information. */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
====Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
&lt;br /&gt;
Added functions download &amp;amp; parse large numbers of records, get TaxonOccurrence gbifKeys, and search with those keys.&lt;br /&gt;
&lt;br /&gt;
====get_record====&lt;br /&gt;
Retrieves a single specified record in DarwinCore XML format, and returns an xmltree for it.&lt;br /&gt;
&lt;br /&gt;
====extract_occurrence_elements====&lt;br /&gt;
Returns a list of the elements, picking elements by TaxonOccurrence; this should return a list of elements equal to the number of hits.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tolist====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns list.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tofile====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns file at outfh.&lt;br /&gt;
&lt;br /&gt;
====get_all_records_by_increment====&lt;br /&gt;
Download all of the records in stages, store in list of elements. Increments of e.g. 100 to not overload server.  Currently stores results in a list of tempfiles which is returned (could return a list of handles I guess).&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
&lt;br /&gt;
Updated functions have been pushed to Github [http://github.com/nmatzke/biopython/commit/5df9025ea5cd3458915db982c69422345e1da8d7 here]&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick(newickstr)====&lt;br /&gt;
Read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any. &lt;br /&gt;
&lt;br /&gt;
====list_leaves(phylo_obj)====&lt;br /&gt;
Print out all of the leaves in above a node object&lt;br /&gt;
&lt;br /&gt;
====treelength(node)====&lt;br /&gt;
Gets the total branchlength above a given node by recursively adding through tree.&lt;br /&gt;
&lt;br /&gt;
====phylodistance(node1, node2)====&lt;br /&gt;
Get the phylogenetic distance (branch length) between two nodes.&lt;br /&gt;
&lt;br /&gt;
====get_distance_matrix(phylo_obj)====&lt;br /&gt;
Get a matrix of all of the pairwise distances between the tips of a tree.&lt;br /&gt;
&lt;br /&gt;
====get_mrca_array(phylo_obj)====&lt;br /&gt;
Get a square list of lists (array) listing the mrca of each pair of leaves (half-diagonal matrix)&lt;br /&gt;
&lt;br /&gt;
====subset_tree(phylo_obj, list_to_keep)====&lt;br /&gt;
Given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree.&lt;br /&gt;
&lt;br /&gt;
====prune_single_desc_nodes(node)====&lt;br /&gt;
Follow a tree from the bottom up, pruning any nodes with only one descendant&lt;br /&gt;
====find_new_root(node)====&lt;br /&gt;
Search up tree from root and make new root at first divergence&lt;br /&gt;
&lt;br /&gt;
====make_None_list_array(xdim, ydim)====&lt;br /&gt;
Make a list of lists (&amp;quot;array&amp;quot;) with the specified dimensions	&lt;br /&gt;
&lt;br /&gt;
====get_PD_to_mrca(node, mrca, PD)====&lt;br /&gt;
Add up the phylogenetic distance from a node to the specified ancestor (mrca).  Find mrca with find_1st_match.&lt;br /&gt;
&lt;br /&gt;
====find_1st_match(list1, list2)====&lt;br /&gt;
Find the first match in two ordered lists.&lt;br /&gt;
&lt;br /&gt;
====get_ancestors_list(node, anc_list)====&lt;br /&gt;
Get the list of ancestors of a given node&lt;br /&gt;
&lt;br /&gt;
====addup_PD(node, PD)====&lt;br /&gt;
Adds the branchlength of the current node to the total PD measure.&lt;br /&gt;
&lt;br /&gt;
====print_tree_outline_format(phylo_obj)====&lt;br /&gt;
Prints the tree out in &amp;quot;outline&amp;quot; format (daughter clades are indented, etc.)&lt;br /&gt;
&lt;br /&gt;
====print_Node(node, rank)====&lt;br /&gt;
Prints the node in question, and recursively all daughter nodes, maintaining rank as it goes.&lt;br /&gt;
&lt;br /&gt;
====lagrange_disclaimer()====&lt;br /&gt;
Just prints lagrange citation etc. in code using lagrange libraries.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
Priority for this week:&lt;br /&gt;
&lt;br /&gt;
Following up on suggestions to make the code more standard, with the priority of figuring out how I can revise the current BioPython phylogeny class, to resemble the better version in lagrange, so that there is a generic flexible phylogeny/newick parser that can be used generally as well as by my BioGeography package specifically.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-24T04:37:19Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: update week4 plan&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
====Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
&lt;br /&gt;
Added functions download &amp;amp; parse large numbers of records, get TaxonOccurrence gbifKeys, and search with those keys.&lt;br /&gt;
&lt;br /&gt;
====get_record====&lt;br /&gt;
Retrieves a single specified record in DarwinCore XML format, and returns an xmltree for it.&lt;br /&gt;
&lt;br /&gt;
====extract_occurrence_elements====&lt;br /&gt;
Returns a list of the elements, picking elements by TaxonOccurrence; this should return a list of elements equal to the number of hits.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tolist====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns list.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tofile====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns file at outfh.&lt;br /&gt;
&lt;br /&gt;
====get_all_records_by_increment====&lt;br /&gt;
Download all of the records in stages, store in list of elements. Increments of e.g. 100 to not overload server.  Currently stores results in a list of tempfiles which is returned (could return a list of handles I guess).&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
&lt;br /&gt;
Updated functions have been pushed to Github [http://github.com/nmatzke/biopython/commit/5df9025ea5cd3458915db982c69422345e1da8d7 here]&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick(newickstr)====&lt;br /&gt;
Read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any. &lt;br /&gt;
&lt;br /&gt;
====list_leaves(phylo_obj)====&lt;br /&gt;
Print out all of the leaves in above a node object&lt;br /&gt;
&lt;br /&gt;
====treelength(node)====&lt;br /&gt;
Gets the total branchlength above a given node by recursively adding through tree.&lt;br /&gt;
&lt;br /&gt;
====phylodistance(node1, node2)====&lt;br /&gt;
Get the phylogenetic distance (branch length) between two nodes.&lt;br /&gt;
&lt;br /&gt;
====get_distance_matrix(phylo_obj)====&lt;br /&gt;
Get a matrix of all of the pairwise distances between the tips of a tree.&lt;br /&gt;
&lt;br /&gt;
====get_mrca_array(phylo_obj)====&lt;br /&gt;
Get a square list of lists (array) listing the mrca of each pair of leaves (half-diagonal matrix)&lt;br /&gt;
&lt;br /&gt;
====subset_tree(phylo_obj, list_to_keep)====&lt;br /&gt;
Given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree.&lt;br /&gt;
&lt;br /&gt;
====prune_single_desc_nodes(node)====&lt;br /&gt;
Follow a tree from the bottom up, pruning any nodes with only one descendant&lt;br /&gt;
====find_new_root(node)====&lt;br /&gt;
Search up tree from root and make new root at first divergence&lt;br /&gt;
&lt;br /&gt;
====make_None_list_array(xdim, ydim)====&lt;br /&gt;
Make a list of lists (&amp;quot;array&amp;quot;) with the specified dimensions	&lt;br /&gt;
&lt;br /&gt;
====get_PD_to_mrca(node, mrca, PD)====&lt;br /&gt;
Add up the phylogenetic distance from a node to the specified ancestor (mrca).  Find mrca with find_1st_match.&lt;br /&gt;
&lt;br /&gt;
====find_1st_match(list1, list2)====&lt;br /&gt;
Find the first match in two ordered lists.&lt;br /&gt;
&lt;br /&gt;
====get_ancestors_list(node, anc_list)====&lt;br /&gt;
Get the list of ancestors of a given node&lt;br /&gt;
&lt;br /&gt;
====addup_PD(node, PD)====&lt;br /&gt;
Adds the branchlength of the current node to the total PD measure.&lt;br /&gt;
&lt;br /&gt;
====print_tree_outline_format(phylo_obj)====&lt;br /&gt;
Prints the tree out in &amp;quot;outline&amp;quot; format (daughter clades are indented, etc.)&lt;br /&gt;
&lt;br /&gt;
====print_Node(node, rank)====&lt;br /&gt;
Prints the node in question, and recursively all daughter nodes, maintaining rank as it goes.&lt;br /&gt;
&lt;br /&gt;
====lagrange_disclaimer()====&lt;br /&gt;
Just prints lagrange citation etc. in code using lagrange libraries.&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
Priority for this week:&lt;br /&gt;
&lt;br /&gt;
Following up on suggestions to make the code more standard, with the priority of figuring out how I can revise the current BioPython phylogeny class, to resemble the better version in lagrange, so that there is a generic flexible phylogeny/newick parser that can be used generally as well as by my BioGeography package specifically.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-24T04:35:36Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information. */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
====Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
&lt;br /&gt;
Added functions download &amp;amp; parse large numbers of records, get TaxonOccurrence gbifKeys, and search with those keys.&lt;br /&gt;
&lt;br /&gt;
====get_record====&lt;br /&gt;
Retrieves a single specified record in DarwinCore XML format, and returns an xmltree for it.&lt;br /&gt;
&lt;br /&gt;
====extract_occurrence_elements====&lt;br /&gt;
Returns a list of the elements, picking elements by TaxonOccurrence; this should return a list of elements equal to the number of hits.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tolist====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns list.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tofile====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns file at outfh.&lt;br /&gt;
&lt;br /&gt;
====get_all_records_by_increment====&lt;br /&gt;
Download all of the records in stages, store in list of elements. Increments of e.g. 100 to not overload server.  Currently stores results in a list of tempfiles which is returned (could return a list of handles I guess).&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
&lt;br /&gt;
Updated functions have been pushed to Github [http://github.com/nmatzke/biopython/commit/5df9025ea5cd3458915db982c69422345e1da8d7 here]&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick(newickstr)====&lt;br /&gt;
Read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any. &lt;br /&gt;
&lt;br /&gt;
====list_leaves(phylo_obj)====&lt;br /&gt;
Print out all of the leaves in above a node object&lt;br /&gt;
&lt;br /&gt;
====treelength(node)====&lt;br /&gt;
Gets the total branchlength above a given node by recursively adding through tree.&lt;br /&gt;
&lt;br /&gt;
====phylodistance(node1, node2)====&lt;br /&gt;
Get the phylogenetic distance (branch length) between two nodes.&lt;br /&gt;
&lt;br /&gt;
====get_distance_matrix(phylo_obj)====&lt;br /&gt;
Get a matrix of all of the pairwise distances between the tips of a tree.&lt;br /&gt;
&lt;br /&gt;
====get_mrca_array(phylo_obj)====&lt;br /&gt;
Get a square list of lists (array) listing the mrca of each pair of leaves (half-diagonal matrix)&lt;br /&gt;
&lt;br /&gt;
====subset_tree(phylo_obj, list_to_keep)====&lt;br /&gt;
Given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree.&lt;br /&gt;
&lt;br /&gt;
====prune_single_desc_nodes(node)====&lt;br /&gt;
Follow a tree from the bottom up, pruning any nodes with only one descendant&lt;br /&gt;
====find_new_root(node)====&lt;br /&gt;
Search up tree from root and make new root at first divergence&lt;br /&gt;
&lt;br /&gt;
====make_None_list_array(xdim, ydim)====&lt;br /&gt;
Make a list of lists (&amp;quot;array&amp;quot;) with the specified dimensions	&lt;br /&gt;
&lt;br /&gt;
====get_PD_to_mrca(node, mrca, PD)====&lt;br /&gt;
Add up the phylogenetic distance from a node to the specified ancestor (mrca).  Find mrca with find_1st_match.&lt;br /&gt;
&lt;br /&gt;
====find_1st_match(list1, list2)====&lt;br /&gt;
Find the first match in two ordered lists.&lt;br /&gt;
&lt;br /&gt;
====get_ancestors_list(node, anc_list)====&lt;br /&gt;
Get the list of ancestors of a given node&lt;br /&gt;
&lt;br /&gt;
====addup_PD(node, PD)====&lt;br /&gt;
Adds the branchlength of the current node to the total PD measure.&lt;br /&gt;
&lt;br /&gt;
====print_tree_outline_format(phylo_obj)====&lt;br /&gt;
Prints the tree out in &amp;quot;outline&amp;quot; format (daughter clades are indented, etc.)&lt;br /&gt;
&lt;br /&gt;
====print_Node(node, rank)====&lt;br /&gt;
Prints the node in question, and recursively all daughter nodes, maintaining rank as it goes.&lt;br /&gt;
&lt;br /&gt;
====lagrange_disclaimer()====&lt;br /&gt;
Just prints lagrange citation etc. in code using lagrange libraries.&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-24T04:35:06Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: updated week 3 results&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
====Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
&lt;br /&gt;
Added functions download &amp;amp; parse large numbers of records, get TaxonOccurrence gbifKeys, and search with those keys.&lt;br /&gt;
&lt;br /&gt;
====get_record====&lt;br /&gt;
Retrieves a single specified record in DarwinCore XML format, and returns an xmltree for it.&lt;br /&gt;
&lt;br /&gt;
====extract_occurrence_elements====&lt;br /&gt;
Returns a list of the elements, picking elements by TaxonOccurrence; this should return a list of elements equal to the number of hits.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tolist====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns list.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tofile====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns file at outfh.&lt;br /&gt;
&lt;br /&gt;
====get_all_records_by_increment====&lt;br /&gt;
Download all of the records in stages, store in list of elements. Increments of e.g. 100 to not overload server.  Currently stores results in a list of tempfiles which is returned (could return a list of handles I guess).&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
&lt;br /&gt;
Updated functions have been pushed to Github [http://github.com/nmatzke/biopython/commit/5df9025ea5cd3458915db982c69422345e1da8d7 here]&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick(newickstr)====&lt;br /&gt;
Read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any. &lt;br /&gt;
&lt;br /&gt;
====list_leaves(phylo_obj)====&lt;br /&gt;
Print out all of the leaves in above a node object&lt;br /&gt;
&lt;br /&gt;
====treelength(node)====&lt;br /&gt;
Gets the total branchlength above a given node by recursively adding through tree.&lt;br /&gt;
&lt;br /&gt;
====phylodistance(node1, node2)====&lt;br /&gt;
Get the phylogenetic distance (branch length) between two nodes.&lt;br /&gt;
&lt;br /&gt;
====get_distance_matrix(phylo_obj)====&lt;br /&gt;
Get a matrix of all of the pairwise distances between the tips of a tree.&lt;br /&gt;
&lt;br /&gt;
====get_mrca_array(phylo_obj)====&lt;br /&gt;
Get a square list of lists (array) listing the mrca of each pair of leaves (half-diagonal matrix)&lt;br /&gt;
&lt;br /&gt;
====subset_tree(phylo_obj, list_to_keep)====&lt;br /&gt;
Given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree.&lt;br /&gt;
&lt;br /&gt;
====prune_single_desc_nodes(node)====&lt;br /&gt;
Follow a tree from the bottom up, pruning any nodes with only one descendent&lt;br /&gt;
	&lt;br /&gt;
find_new_root(node)====&lt;br /&gt;
Search up tree from root and make new root at first divergence&lt;br /&gt;
&lt;br /&gt;
====make_None_list_array(xdim, ydim)====&lt;br /&gt;
Make a list of lists (&amp;quot;array&amp;quot;) with the specified dimensions	&lt;br /&gt;
&lt;br /&gt;
====get_PD_to_mrca(node, mrca, PD)====&lt;br /&gt;
Add up the phylogenetic distance from a node to the specified ancestor (mrca).  Find mrca with find_1st_match.&lt;br /&gt;
&lt;br /&gt;
====find_1st_match(list1, list2)====&lt;br /&gt;
Find the first match in two ordered lists.&lt;br /&gt;
&lt;br /&gt;
====get_ancestors_list(node, anc_list)====&lt;br /&gt;
Get the list of ancestors of a given node&lt;br /&gt;
&lt;br /&gt;
====addup_PD(node, PD)====&lt;br /&gt;
Adds the branchlength of the current node to the total PD measure.&lt;br /&gt;
	&lt;br /&gt;
print_tree_outline_format(phylo_obj)====&lt;br /&gt;
Prints the tree out in &amp;quot;outline&amp;quot; format (daughter clades are indented, etc.)&lt;br /&gt;
&lt;br /&gt;
====print_Node(node, rank)====&lt;br /&gt;
Prints the node in question, and recursively all daughter nodes, maintaining rank as it goes.&lt;br /&gt;
&lt;br /&gt;
====lagrange_disclaimer()====&lt;br /&gt;
Just prints lagrange citation etc. in code using lagrange libraries.&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-15T00:48:02Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: week 3 update: Added functions download &amp;amp; parse large numbers of records, get TaxonOccurrence gbifKeys, and search with those keys.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
====Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
&lt;br /&gt;
Added functions download &amp;amp; parse large numbers of records, get TaxonOccurrence gbifKeys, and search with those keys.&lt;br /&gt;
&lt;br /&gt;
====get_record====&lt;br /&gt;
Retrieves a single specified record in DarwinCore XML format, and returns an xmltree for it.&lt;br /&gt;
&lt;br /&gt;
====extract_occurrence_elements====&lt;br /&gt;
Returns a list of the elements, picking elements by TaxonOccurrence; this should return a list of elements equal to the number of hits.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tolist====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns list.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tofile====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns file at outfh.&lt;br /&gt;
&lt;br /&gt;
====get_all_records_by_increment====&lt;br /&gt;
Download all of the records in stages, store in list of elements. Increments of e.g. 100 to not overload server.  Currently stores results in a list of tempfiles which is returned (could return a list of handles I guess).&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
&lt;br /&gt;
Updated functions have been pushed to Github [http://github.com/nmatzke/biopython/commit/5df9025ea5cd3458915db982c69422345e1da8d7 here]&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick====&lt;br /&gt;
read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
====treelength====&lt;br /&gt;
get the total branchlength above a given node&lt;br /&gt;
====phylodistance====&lt;br /&gt;
get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
====get_distance_matrix====&lt;br /&gt;
get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
====subset_tree====&lt;br /&gt;
given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-14T23:58:48Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: uploaded June week 2 functions&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
====Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
====get_record====&lt;br /&gt;
Retrieves a single specified record in DarwinCore XML format, and returns an xmltree for it.&lt;br /&gt;
&lt;br /&gt;
====extract_occurrence_elements====&lt;br /&gt;
Returns a list of the elements, picking elements by TaxonOccurrence; this should return a list of elements equal to the number of hits.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tolist====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns list.&lt;br /&gt;
&lt;br /&gt;
====extract_taxonconceptkeys_tofile====&lt;br /&gt;
Searches an element in an XML tree for TaxonOccurrence gbifKeys, and the complete name. Searches recursively, if there are subelements.  Returns file at outfh.&lt;br /&gt;
&lt;br /&gt;
====get_all_records_by_increment====&lt;br /&gt;
Download all of the records in stages, store in list of elements. Increments of e.g. 100 to not overload server.  Currently stores results in a list of tempfiles which is returned (could return a list of handles I guess).&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick====&lt;br /&gt;
read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
====treelength====&lt;br /&gt;
get the total branchlength above a given node&lt;br /&gt;
====phylodistance====&lt;br /&gt;
get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
====get_distance_matrix====&lt;br /&gt;
get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
====subset_tree====&lt;br /&gt;
given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-14T23:48:20Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: get_record info&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
====Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
====get_record====&lt;br /&gt;
Retrieves a single specified record in DarwinCore XML format, and returns an xmltree for it.&lt;br /&gt;
&lt;br /&gt;
====getGBIFrecords====&lt;br /&gt;
calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
====readGBIFrecords====&lt;br /&gt;
calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick====&lt;br /&gt;
read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
====treelength====&lt;br /&gt;
get the total branchlength above a given node&lt;br /&gt;
====phylodistance====&lt;br /&gt;
get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
====get_distance_matrix====&lt;br /&gt;
get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
====subset_tree====&lt;br /&gt;
given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-14T21:07:55Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* print_xmltree */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
====Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
====getGBIFrecord====&lt;br /&gt;
retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
====getGBIFrecords====&lt;br /&gt;
calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
====readGBIFrecords====&lt;br /&gt;
calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick====&lt;br /&gt;
read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
====treelength====&lt;br /&gt;
get the total branchlength above a given node&lt;br /&gt;
====phylodistance====&lt;br /&gt;
get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
====get_distance_matrix====&lt;br /&gt;
get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
====subset_tree====&lt;br /&gt;
given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-14T19:18:12Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* June, week 1: Functions to search GBIF and download occurrence records */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
* Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commits/Geography Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
====getGBIFrecord====&lt;br /&gt;
retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
====getGBIFrecords====&lt;br /&gt;
calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
====readGBIFrecords====&lt;br /&gt;
calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick====&lt;br /&gt;
read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
====treelength====&lt;br /&gt;
get the total branchlength above a given node&lt;br /&gt;
====phylodistance====&lt;br /&gt;
get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
====get_distance_matrix====&lt;br /&gt;
get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
====subset_tree====&lt;br /&gt;
given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-14T19:15:25Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
====shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
* Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
====getGBIFrecord====&lt;br /&gt;
retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
====getGBIFrecords====&lt;br /&gt;
calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
====readGBIFrecords====&lt;br /&gt;
calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick====&lt;br /&gt;
read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
====treelength====&lt;br /&gt;
get the total branchlength above a given node&lt;br /&gt;
====phylodistance====&lt;br /&gt;
get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
====get_distance_matrix====&lt;br /&gt;
get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
====subset_tree====&lt;br /&gt;
given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-14T19:02:32Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: changed formatting to make functions into subheadings&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
====readshpfile====&lt;br /&gt;
Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
====extract_latlong====&lt;br /&gt;
Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
* Functions: shapefile_points_in_poly, tablefile_points_in_poly====&lt;br /&gt;
Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
====access_gbif====&lt;br /&gt;
utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
====get_hits====&lt;br /&gt;
Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
====get_xml_hits====&lt;br /&gt;
Like get_hits, but returns a parsed XML tree&lt;br /&gt;
====fix_ASCII====&lt;br /&gt;
files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
====paramsdict_to_string====&lt;br /&gt;
converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
====xmlstring_to_xmltree(xmlstring)====&lt;br /&gt;
Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
====element_items_to_dictionary====&lt;br /&gt;
If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
====extract_numhits====&lt;br /&gt;
Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
====print_xmltree====&lt;br /&gt;
Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
* Deleted (turns out this was unnecessary): gettaxonconceptkey====&lt;br /&gt;
user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
====getGBIFrecord====&lt;br /&gt;
retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
====getGBIFrecords====&lt;br /&gt;
calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
====readGBIFrecords====&lt;br /&gt;
calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====read_ultrametric_Newick====&lt;br /&gt;
read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
====treelength====&lt;br /&gt;
get the total branchlength above a given node&lt;br /&gt;
====phylodistance====&lt;br /&gt;
get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
====get_distance_matrix====&lt;br /&gt;
get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
====subset_tree====&lt;br /&gt;
given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
====alphadiversity====&lt;br /&gt;
alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
====betadiversity====&lt;br /&gt;
beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
====alphaphylodistance====&lt;br /&gt;
total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
====phylosor====&lt;br /&gt;
phylogenetic Sorenson’s index between two regions&lt;br /&gt;
====meanphylodistance====&lt;br /&gt;
average distance between all tips on a region’s phylogeny&lt;br /&gt;
====meanminphylodistance====&lt;br /&gt;
average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
====netrelatednessindex====&lt;br /&gt;
standardized index of mean phylodistance&lt;br /&gt;
====nearesttaxonindex====&lt;br /&gt;
standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
====make_lagrange_species_range_inputs====&lt;br /&gt;
convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
====check_input_lagrange_tree====&lt;br /&gt;
checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
====parse_lagrange_output====&lt;br /&gt;
take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
====get_polygon_skeleton====&lt;br /&gt;
this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
====assign_node_locations_in_region====&lt;br /&gt;
within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
====assign_node_locations_between_regions====&lt;br /&gt;
connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
====write_history_to_shapefile====&lt;br /&gt;
write the biogeographic history to a shapefile&lt;br /&gt;
====write_history_to_KML====&lt;br /&gt;
write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-14T19:00:35Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: updated function names to refer to functions in geogUtils.py&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
* Function: readshpfile&lt;br /&gt;
::Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
* Function: extract_latlong&lt;br /&gt;
::Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
* Functions: shapefile_points_in_poly, tablefile_points_in_poly&lt;br /&gt;
::Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  &lt;br /&gt;
&lt;br /&gt;
* Function: access_gbif – utility function invoked by other functions, user inputs parameters and the GBIF response in XML/DarwinCore format is returned. The relevant GBIF web service, and the search commands etc., are here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
* Function: get_hits -- Get the actual hits that are be returned by a given search, returns filename were they are saved&lt;br /&gt;
* Function: get_xml_hits -- Like get_hits, but returns a parsed XML tree&lt;br /&gt;
* Function: fix_ASCII -- files downloaded from GBIF contain HTML character entities &amp;amp; unicode characters (e.g. umlauts mostly) which mess up printing results to prompt in Python, this fixes that&lt;br /&gt;
* Function: paramsdict_to_string -- converts user's search parameters (in python dictionary format; see here for params http://data.gbif.org/ws/rest/occurrence ) to a string for submission via access_gbif&lt;br /&gt;
* Function: xmlstring_to_xmltree(xmlstring) -- Take the text string returned by GBIF and parse to an XML tree using ElementTree.  Requires the intermediate step of saving to a temporary file (required to make ElementTree.parse work, apparently).&lt;br /&gt;
* Function: element_items_to_dictionary -- If the XML tree element has items encoded in the tag, e.g. key/value or whatever, this function puts them in a python dictionary and returns them.&lt;br /&gt;
* Function: extract_numhits -- Search an element of a parsed XML string and find the number of hits, if it exists.  Recursively searches, if there are subelements.&lt;br /&gt;
* Function: print_xmltree -- Prints all the elements &amp;amp; subelements of the xmltree to screen (may require fix_ASCII to input file to succeed)&lt;br /&gt;
* Deleted (turns out this was unnecessary): gettaxonconceptkey – user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon&lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
* Function: getGBIFrecord – retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
* Function: getGBIFrecords – calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
* Function: readGBIFrecords – calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: read_ultrametric_Newick – read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
* Function: treelength – get the total branchlength above a given node&lt;br /&gt;
* Function: phylodistance – get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
* Function: get_distance_matrix – get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
* Function: subset_tree – given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: alphadiversity – alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
* Function: betadiversity – beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
* Function: alphaphylodistance – total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
* Function: phylosor – phylogenetic Sorenson’s index between two regions&lt;br /&gt;
* Function: meanphylodistance – average distance between all tips on a region’s phylogeny&lt;br /&gt;
* Function: meanminphylodistance – average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
* Function: netrelatednessindex – standardized index of mean phylodistance&lt;br /&gt;
* Function: nearesttaxonindex – standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
* Function: make_lagrange_species_range_inputs – convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
* Function: check_input_lagrange_tree – checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
* Function: parse_lagrange_output – take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
* Function: get_polygon_skeleton – this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
* Function: assign_node_locations_in_region -- within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
* Function: assign_node_locations_between_regions – connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
* Function: write_history_to_shapefile -- write the biogeographic history to a shapefile&lt;br /&gt;
* Function: write_history_to_KML – write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-14T18:23:45Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: notes on geogUtils&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
Note: all major functions are being placed in the file geogUtils.py for the moment. Also, the immediate goal is to just get everything basically working, so details of where to put various functions, what to call them, etc. are being left for later.&lt;br /&gt;
&lt;br /&gt;
Code usage: For a few things, an entire necessary function already exists (e.g. for reading a shapefile), and re-inventing the wheel seems pointless.  In most cases the material used appears to be open source (e.g. previous Google Summer of Code).  For a few short code snippets found online in various places I am less sure.  In all cases I am noting the source and when finalizing this project I will go back and determine if the stuff is considered copyright, and if so email the authors for permission to use.&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
* Function: readshpfile&lt;br /&gt;
::Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
* Function: extract_latlong&lt;br /&gt;
::Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
* Functions: shapefile_points_in_poly, tablefile_points_in_poly&lt;br /&gt;
::Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  The relevant GBIF web service is here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
&lt;br /&gt;
* Function: searchGBIFrecords – user inputs parameters and a list of GBIF records is returned&lt;br /&gt;
* Function: gettaxonconceptkey – user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon &lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
* Function: getGBIFrecord – retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
* Function: getGBIFrecords – calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
* Function: readGBIFrecords – calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: read_ultrametric_Newick – read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
* Function: treelength – get the total branchlength above a given node&lt;br /&gt;
* Function: phylodistance – get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
* Function: get_distance_matrix – get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
* Function: subset_tree – given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: alphadiversity – alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
* Function: betadiversity – beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
* Function: alphaphylodistance – total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
* Function: phylosor – phylogenetic Sorenson’s index between two regions&lt;br /&gt;
* Function: meanphylodistance – average distance between all tips on a region’s phylogeny&lt;br /&gt;
* Function: meanminphylodistance – average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
* Function: netrelatednessindex – standardized index of mean phylodistance&lt;br /&gt;
* Function: nearesttaxonindex – standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
* Function: make_lagrange_species_range_inputs – convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
* Function: check_input_lagrange_tree – checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
* Function: parse_lagrange_output – take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
* Function: get_polygon_skeleton – this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
* Function: assign_node_locations_in_region -- within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
* Function: assign_node_locations_between_regions – connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
* Function: write_history_to_shapefile -- write the biogeographic history to a shapefile&lt;br /&gt;
* Function: write_history_to_KML – write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-14T18:17:16Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: updated function names to refer to functions in geogUtils.py&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
* Function: readshpfile&lt;br /&gt;
::Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
* Function: extract_latlong&lt;br /&gt;
::Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
* Functions: shapefile_points_in_poly, tablefile_points_in_poly&lt;br /&gt;
::Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  The relevant GBIF web service is here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
&lt;br /&gt;
* Function: searchGBIFrecords – user inputs parameters and a list of GBIF records is returned&lt;br /&gt;
* Function: gettaxonconceptkey – user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon &lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
* Function: getGBIFrecord – retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
* Function: getGBIFrecords – calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
* Function: readGBIFrecords – calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: read_ultrametric_Newick – read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
* Function: treelength – get the total branchlength above a given node&lt;br /&gt;
* Function: phylodistance – get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
* Function: get_distance_matrix – get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
* Function: subset_tree – given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: alphadiversity – alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
* Function: betadiversity – beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
* Function: alphaphylodistance – total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
* Function: phylosor – phylogenetic Sorenson’s index between two regions&lt;br /&gt;
* Function: meanphylodistance – average distance between all tips on a region’s phylogeny&lt;br /&gt;
* Function: meanminphylodistance – average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
* Function: netrelatednessindex – standardized index of mean phylodistance&lt;br /&gt;
* Function: nearesttaxonindex – standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
* Function: make_lagrange_species_range_inputs – convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
* Function: check_input_lagrange_tree – checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
* Function: parse_lagrange_output – take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
* Function: get_polygon_skeleton – this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
* Function: assign_node_locations_in_region -- within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
* Function: assign_node_locations_between_regions – connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
* Function: write_history_to_shapefile -- write the biogeographic history to a shapefile&lt;br /&gt;
* Function: write_history_to_KML – write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-06-02T14:06:08Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: update on week1&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
* Function: readshapefile&lt;br /&gt;
::Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
* Function: readGBIFrecord&lt;br /&gt;
::Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
* Function: points2ranges&lt;br /&gt;
::Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
====Code====&lt;br /&gt;
* [http://github.com/nmatzke/biopython/commit/4d963a65ce48b9d50327f191dedcc76abbb149be Code fulfilling these tasks is uploaded here], along with an example script and data files to run&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  The relevant GBIF web service is here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
&lt;br /&gt;
* Function: searchGBIFrecords – user inputs parameters and a list of GBIF records is returned&lt;br /&gt;
* Function: gettaxonconceptkey – user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon &lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
* Function: getGBIFrecord – retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
* Function: getGBIFrecords – calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
* Function: readGBIFrecords – calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: read_ultrametric_Newick – read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
* Function: treelength – get the total branchlength above a given node&lt;br /&gt;
* Function: phylodistance – get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
* Function: get_distance_matrix – get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
* Function: subset_tree – given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: alphadiversity – alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
* Function: betadiversity – beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
* Function: alphaphylodistance – total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
* Function: phylosor – phylogenetic Sorenson’s index between two regions&lt;br /&gt;
* Function: meanphylodistance – average distance between all tips on a region’s phylogeny&lt;br /&gt;
* Function: meanminphylodistance – average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
* Function: netrelatednessindex – standardized index of mean phylodistance&lt;br /&gt;
* Function: nearesttaxonindex – standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
* Function: make_lagrange_species_range_inputs – convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
* Function: check_input_lagrange_tree – checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
* Function: parse_lagrange_output – take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
* Function: get_polygon_skeleton – this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
* Function: assign_node_locations_in_region -- within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
* Function: assign_node_locations_between_regions – connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
* Function: write_history_to_shapefile -- write the biogeographic history to a shapefile&lt;br /&gt;
* Function: write_history_to_KML – write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-05-22T21:57:03Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: fix github references etc&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The source code is in the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development of the module [http://biopython.org/wiki/BioGeography here]. The new module is being documented on [http://www.biopython.org/wiki/Main_Page the BioPython wiki] as [http://biopython.org/wiki/BioGeography BioGeography].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
* Function: readshapefile&lt;br /&gt;
::Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
* Function: readGBIFrecord&lt;br /&gt;
::Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
* Function: points2ranges&lt;br /&gt;
::Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  The relevant GBIF web service is here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
&lt;br /&gt;
* Function: searchGBIFrecords – user inputs parameters and a list of GBIF records is returned&lt;br /&gt;
* Function: gettaxonconceptkey – user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon &lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
* Function: getGBIFrecord – retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
* Function: getGBIFrecords – calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
* Function: readGBIFrecords – calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: read_ultrametric_Newick – read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
* Function: treelength – get the total branchlength above a given node&lt;br /&gt;
* Function: phylodistance – get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
* Function: get_distance_matrix – get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
* Function: subset_tree – given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: alphadiversity – alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
* Function: betadiversity – beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
* Function: alphaphylodistance – total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
* Function: phylosor – phylogenetic Sorenson’s index between two regions&lt;br /&gt;
* Function: meanphylodistance – average distance between all tips on a region’s phylogeny&lt;br /&gt;
* Function: meanminphylodistance – average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
* Function: netrelatednessindex – standardized index of mean phylodistance&lt;br /&gt;
* Function: nearesttaxonindex – standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
* Function: make_lagrange_species_range_inputs – convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
* Function: check_input_lagrange_tree – checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
* Function: parse_lagrange_output – take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
* Function: get_polygon_skeleton – this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
* Function: assign_node_locations_in_region -- within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
* Function: assign_node_locations_between_regions – connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
* Function: write_history_to_shapefile -- write the biogeographic history to a shapefile&lt;br /&gt;
* Function: write_history_to_KML – write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/Active_projects</id>
		<title>Active projects</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/Active_projects"/>
				<updated>2009-05-22T21:51:04Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Biogeography (GSoC) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page provides a central location to collect references to active projects. This is a good place to start if you are interested in contributing to Biopython and want to find larger projects in progress. For developers, use this to reference git branches or other projects which you will be working on for an extended period of time. Please keep it up to date as projects are finished and integrated into Biopython.&lt;br /&gt;
&lt;br /&gt;
== Current projects ==&lt;br /&gt;
&lt;br /&gt;
=== Population Genetics development ===&lt;br /&gt;
&lt;br /&gt;
Giovanni and Tiago are working on expanding population genetics code in Biopython. See the [[PopGen_dev|PopGen development page]] for more details.&lt;br /&gt;
&lt;br /&gt;
=== GFF parser ===&lt;br /&gt;
&lt;br /&gt;
Brad is working on a Biopython GFF parser. Source code is available from [http://github.com/chapmanb/bcbb/tree/master/gff git hub]. See blog posts on the [http://bcbio.wordpress.com/2009/03/08/initial-gff-parser-for-biopython/ initial implementation] and [http://bcbio.wordpress.com/2009/03/22/mapreduce-implementation-of-gff-parsing-for-biopython/ MapReduce parallel version].&lt;br /&gt;
&lt;br /&gt;
=== PhyloXML driver (GSoC) ===&lt;br /&gt;
&lt;br /&gt;
Eric is working on supporting the [http://www.phyloxml.org/ PhyloXML] format, as a [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798969 project] for Google Summer of Code 2009. Brad is mentoring this project. The code lives on a branch in [http://github.com/etal/biopython/tree/phyloxml GitHub], and you can see a timeline and other info about ongoing development [http://github.com/etal/biopython/tree/phyloxml/Bio/PhyloXML/ here]. The new module is being documented on this wiki as [[PhyloXML]].&lt;br /&gt;
&lt;br /&gt;
=== Biogeography (GSoC) ===&lt;br /&gt;
&lt;br /&gt;
[[Matzke|Nick]] is working on developing a Biogeography module for BioPython.  This work is funded by [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The code currently lives at the Bio/Geography directory of the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development [[BioGeography|here]]. The new module is being documented on this wiki as [[BioGeography]].&lt;br /&gt;
&lt;br /&gt;
=== Roche 454 SFF parsing in Bio.SeqIO ===&lt;br /&gt;
&lt;br /&gt;
See [http://bugzilla.open-bio.org/show_bug.cgi?id=2837 Bug 2837], based on code from Jose Blanca.&lt;br /&gt;
&lt;br /&gt;
=== Open Enhancement Bugs ===&lt;br /&gt;
&lt;br /&gt;
This [http://bugzilla.open-bio.org/buglist.cgi?product=Biopython&amp;amp;bug_status=NEW&amp;amp;bug_status=ASSIGNED&amp;amp;bug_status=REOPENED&amp;amp;bug_severity=enhancement Bugzilla Search] will list all open enhancement bugs (any filed by core developers are fairly likely to be integrated, some are just wish list entries).&lt;br /&gt;
&lt;br /&gt;
== Project ideas ==&lt;br /&gt;
&lt;br /&gt;
Please add any ideas or proposals for new additions to Biopython. Bugs and enhancements for current code should be discussed though our bugzilla interface.&lt;br /&gt;
&lt;br /&gt;
* Use SQLAlchemy, an object relational mapper, for BioSQL internals. This would add an additional external dependency to Biopython, but provides ready support for additional databases like SQLite. It also would provide a raw object interface to BioSQL databases when the SeqRecord-like interface is not sufficient. Brad has some initial code for this.&lt;br /&gt;
&lt;br /&gt;
* Revamp the GEO SOFT parser, drawing on the ideas used in [http://www.bioconductor.org/packages/bioc/html/GEOquery.html Sean Davis' GEOquery parser in R/Bioconductor].  See also [http://www.warwick.ac.uk/go/peter_cock/r/geo/ this page].&lt;br /&gt;
&lt;br /&gt;
== Enhancement list ==&lt;br /&gt;
&lt;br /&gt;
Maintaining software involves incremental improvements for new format changes and removal of bugs. Please see our [http://bugzilla.open-bio.org/ bugzilla] page for a current list. Post to the developer mailing list if you are interested in tackling any open issues.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/Active_projects</id>
		<title>Active projects</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/Active_projects"/>
				<updated>2009-05-22T21:50:26Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Biogeography (GSoC) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page provides a central location to collect references to active projects. This is a good place to start if you are interested in contributing to Biopython and want to find larger projects in progress. For developers, use this to reference git branches or other projects which you will be working on for an extended period of time. Please keep it up to date as projects are finished and integrated into Biopython.&lt;br /&gt;
&lt;br /&gt;
== Current projects ==&lt;br /&gt;
&lt;br /&gt;
=== Population Genetics development ===&lt;br /&gt;
&lt;br /&gt;
Giovanni and Tiago are working on expanding population genetics code in Biopython. See the [[PopGen_dev|PopGen development page]] for more details.&lt;br /&gt;
&lt;br /&gt;
=== GFF parser ===&lt;br /&gt;
&lt;br /&gt;
Brad is working on a Biopython GFF parser. Source code is available from [http://github.com/chapmanb/bcbb/tree/master/gff git hub]. See blog posts on the [http://bcbio.wordpress.com/2009/03/08/initial-gff-parser-for-biopython/ initial implementation] and [http://bcbio.wordpress.com/2009/03/22/mapreduce-implementation-of-gff-parsing-for-biopython/ MapReduce parallel version].&lt;br /&gt;
&lt;br /&gt;
=== PhyloXML driver (GSoC) ===&lt;br /&gt;
&lt;br /&gt;
Eric is working on supporting the [http://www.phyloxml.org/ PhyloXML] format, as a [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798969 project] for Google Summer of Code 2009. Brad is mentoring this project. The code lives on a branch in [http://github.com/etal/biopython/tree/phyloxml GitHub], and you can see a timeline and other info about ongoing development [http://github.com/etal/biopython/tree/phyloxml/Bio/PhyloXML/ here]. The new module is being documented on this wiki as [[PhyloXML]].&lt;br /&gt;
&lt;br /&gt;
=== Biogeography (GSoC) ===&lt;br /&gt;
&lt;br /&gt;
[[Matzke|Nick]] is working on developing a Biogeography module for BioPython.  This work is funded by [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The code currently lives at the [http://github.com/nmatzke/biopython/tree/Geography Geography fork of the nmatzke branch on  GitHub], and you can see a timeline and other info about ongoing development [[BioGeography|here]]. The new module is being documented on this wiki as [[BioGeography]].&lt;br /&gt;
&lt;br /&gt;
=== Roche 454 SFF parsing in Bio.SeqIO ===&lt;br /&gt;
&lt;br /&gt;
See [http://bugzilla.open-bio.org/show_bug.cgi?id=2837 Bug 2837], based on code from Jose Blanca.&lt;br /&gt;
&lt;br /&gt;
=== Open Enhancement Bugs ===&lt;br /&gt;
&lt;br /&gt;
This [http://bugzilla.open-bio.org/buglist.cgi?product=Biopython&amp;amp;bug_status=NEW&amp;amp;bug_status=ASSIGNED&amp;amp;bug_status=REOPENED&amp;amp;bug_severity=enhancement Bugzilla Search] will list all open enhancement bugs (any filed by core developers are fairly likely to be integrated, some are just wish list entries).&lt;br /&gt;
&lt;br /&gt;
== Project ideas ==&lt;br /&gt;
&lt;br /&gt;
Please add any ideas or proposals for new additions to Biopython. Bugs and enhancements for current code should be discussed though our bugzilla interface.&lt;br /&gt;
&lt;br /&gt;
* Use SQLAlchemy, an object relational mapper, for BioSQL internals. This would add an additional external dependency to Biopython, but provides ready support for additional databases like SQLite. It also would provide a raw object interface to BioSQL databases when the SeqRecord-like interface is not sufficient. Brad has some initial code for this.&lt;br /&gt;
&lt;br /&gt;
* Revamp the GEO SOFT parser, drawing on the ideas used in [http://www.bioconductor.org/packages/bioc/html/GEOquery.html Sean Davis' GEOquery parser in R/Bioconductor].  See also [http://www.warwick.ac.uk/go/peter_cock/r/geo/ this page].&lt;br /&gt;
&lt;br /&gt;
== Enhancement list ==&lt;br /&gt;
&lt;br /&gt;
Maintaining software involves incremental improvements for new format changes and removal of bugs. Please see our [http://bugzilla.open-bio.org/ bugzilla] page for a current list. Post to the developer mailing list if you are interested in tackling any open issues.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-05-20T05:03:14Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The code currently lives at the nmatzke branch on [http://github.com/nmatzke/biopython/tree/master GitHub], and you can see a timeline and other info about ongoing development [http://github.com/nmatzke/biopython/tree/master here].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
* Function: readshapefile&lt;br /&gt;
::Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
* Function: readGBIFrecord&lt;br /&gt;
::Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
* Function: points2ranges&lt;br /&gt;
::Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  The relevant GBIF web service is here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
&lt;br /&gt;
* Function: searchGBIFrecords – user inputs parameters and a list of GBIF records is returned&lt;br /&gt;
* Function: gettaxonconceptkey – user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon &lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
* Function: getGBIFrecord – retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
* Function: getGBIFrecords – calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
* Function: readGBIFrecords – calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: read_ultrametric_Newick – read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
* Function: treelength – get the total branchlength above a given node&lt;br /&gt;
* Function: phylodistance – get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
* Function: get_distance_matrix – get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
* Function: subset_tree – given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: alphadiversity – alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
* Function: betadiversity – beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
* Function: alphaphylodistance – total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
* Function: phylosor – phylogenetic Sorenson’s index between two regions&lt;br /&gt;
* Function: meanphylodistance – average distance between all tips on a region’s phylogeny&lt;br /&gt;
* Function: meanminphylodistance – average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
* Function: netrelatednessindex – standardized index of mean phylodistance&lt;br /&gt;
* Function: nearesttaxonindex – standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
* Function: make_lagrange_species_range_inputs – convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
* Function: check_input_lagrange_tree – checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
* Function: parse_lagrange_output – take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
* Function: get_polygon_skeleton – this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
* Function: assign_node_locations_in_region -- within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
* Function: assign_node_locations_between_regions – connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
* Function: write_history_to_shapefile -- write the biogeographic history to a shapefile&lt;br /&gt;
* Function: write_history_to_KML – write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-05-20T04:58:36Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Work Plan */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The code currently lives at the nmatzke branch on [http://github.com/nmatzke/biopython/tree/master GitHub], and you can see a timeline and other info about ongoing development [http://github.com/nmatzke/biopython/tree/master here].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
* Function: readshapefile&lt;br /&gt;
::Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
* Function: readGBIFrecord&lt;br /&gt;
::Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
* Function: points2ranges&lt;br /&gt;
::Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  The relevant GBIF web service is here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
&lt;br /&gt;
* Function: searchGBIFrecords – user inputs parameters and a list of GBIF records is returned&lt;br /&gt;
* Function: gettaxonconceptkey – user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon &lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
* Function: getGBIFrecord – retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
* Function: getGBIFrecords – calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
* Function: readGBIFrecords – calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: read_ultrametric_Newick – read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
* Function: treelength – get the total branchlength above a given node&lt;br /&gt;
* Function: phylodistance – get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
* Function: get_distance_matrix – get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
* Function: subset_tree – given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: alphadiversity – alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
* Function: betadiversity – beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
* Function: alphaphylodistance – total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
* Function: phylosor – phylogenetic Sorenson’s index between two regions&lt;br /&gt;
* Function: meanphylodistance – average distance between all tips on a region’s phylogeny&lt;br /&gt;
* Function: meanminphylodistance – average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
* Function: netrelatednessindex – standardized index of mean phylodistance&lt;br /&gt;
* Function: nearesttaxonindex – standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
* Function: make_lagrange_species_range_inputs – convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
* Function: check_input_lagrange_tree – checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
* Function: parse_lagrange_output – take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
* Function: get_polygon_skeleton – this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
* Function: assign_node_locations_in_region -- within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
* Function: assign_node_locations_between_regions – connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
* Function: write_history_to_shapefile -- write the biogeographic history to a shapefile&lt;br /&gt;
* Function: write_history_to_KML – write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing===&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-05-20T04:57:00Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: starter&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Introduction==&lt;br /&gt;
&lt;br /&gt;
BioGeography is a module under development by [[Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The code currently lives at the nmatzke branch on [http://github.com/nmatzke/biopython/tree/master GitHub], and you can see a timeline and other info about ongoing development [http://github.com/nmatzke/biopython/tree/master here].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;br /&gt;
&lt;br /&gt;
==Work Plan==&lt;br /&gt;
&lt;br /&gt;
===May, week 1: Functions to read locality data and place points in geographic regions (Tasks 1-2)===&lt;br /&gt;
* Function: readshapefile&lt;br /&gt;
::Parses polygon, point, and multipoint shapefiles into python objects (storing latitude/longitude coordinates and feature names, e.g. the region name associated with each polygon)&lt;br /&gt;
* Function: readGBIFrecord&lt;br /&gt;
::Parse a manually downloaded GBIF record, extracting latitude/longitude and taxon names&lt;br /&gt;
* Function: points2ranges&lt;br /&gt;
::Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.&lt;br /&gt;
&lt;br /&gt;
===June, week 1: Functions to search GBIF and download occurrence records===&lt;br /&gt;
&lt;br /&gt;
Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data.  The relevant GBIF web service is here: http://data.gbif.org/ws/rest/occurrence &lt;br /&gt;
&lt;br /&gt;
* Function: searchGBIFrecords – user inputs parameters and a list of GBIF records is returned&lt;br /&gt;
* Function: gettaxonconceptkey – user inputs a taxon name and gets the GBIF key back (useful for searching GBIF records and finding e.g. synonyms and daughter taxa).  The GBIF taxon concepts are accessed via the taxon web service: http://data.gbif.org/ws/rest/taxon &lt;br /&gt;
&lt;br /&gt;
===June, week 2: Functions to get GBIF records===&lt;br /&gt;
* Function: getGBIFrecord – retrieves the record (for this project, just the “brief” format of the record) and saves it&lt;br /&gt;
* Function: getGBIFrecords – calls getGBIFrecord for a user-specified list of records (derived from searchGBIFrecords function call)&lt;br /&gt;
* Function: readGBIFrecords – calls readGBIFrecord on a list of saved records&lt;br /&gt;
&lt;br /&gt;
===June, week 3: Functions to read user-specified Newick files (with ages and internal node labels) and generate basic summary information.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: read_ultrametric_Newick – read a Newick file into a tree object (a series of node objects links to parent and daughter nodes), also reading node ages and node labels if any.&lt;br /&gt;
* Function: treelength – get the total branchlength above a given node&lt;br /&gt;
* Function: phylodistance – get the phylogenetic distance (branch length) between two nodes&lt;br /&gt;
* Function: get_distance_matrix – get a matrix of all of the pairwise distances between the tips of a tree.  &lt;br /&gt;
&lt;br /&gt;
This can be a slow function for large trees; currently I call a java function from python, this is probably the way to go.&lt;br /&gt;
&lt;br /&gt;
* Function: subset_tree – given a list of tips and a tree, remove all other tips and resulting redundant nodes to produce a new smaller tree (as in Phylomatic)&lt;br /&gt;
&lt;br /&gt;
===June, week 4: Functions to summarize taxon diversity in regions, given a phylogeny and a list of taxa and the regions they are in.===&lt;br /&gt;
&lt;br /&gt;
(note: I have scripts doing all of these functions already, so the work is integrating them into a Biopython module, testing them, etc.)&lt;br /&gt;
&lt;br /&gt;
* Function: alphadiversity – alpha diversity of a region (number of taxa in the region)&lt;br /&gt;
* Function: betadiversity – beta diversity (Sorenson’s index) between two regions&lt;br /&gt;
* Function: alphaphylodistance – total branchlength of a phylogeny of taxa within a region&lt;br /&gt;
* Function: phylosor – phylogenetic Sorenson’s index between two regions&lt;br /&gt;
* Function: meanphylodistance – average distance between all tips on a region’s phylogeny&lt;br /&gt;
* Function: meanminphylodistance – average distance to nearest neighbor for tips on a region’s phylogeny&lt;br /&gt;
* Function: netrelatednessindex – standardized index of mean phylodistance&lt;br /&gt;
* Function: nearesttaxonindex – standardized index of mean minimum phylodistance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===July, week 1: lagrange input/output handling (Task 6)===&lt;br /&gt;
&lt;br /&gt;
(note: lagrange requires a number of input files, e.g. hypothesized histories of connectivity; the only inputs suitable for automation in this project are the species ranges and phylogeny&lt;br /&gt;
&lt;br /&gt;
* Function: make_lagrange_species_range_inputs – convert list of taxa/ranges to input format: http://www.reelab.net/lagrange/configurator/index &lt;br /&gt;
* Function: check_input_lagrange_tree – checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file&lt;br /&gt;
* Function: parse_lagrange_output – take the output file from lagrange and get ages and estimated regions for each node&lt;br /&gt;
&lt;br /&gt;
===July, weeks 2-3: Devise algorithm for representing estimated node histories (location of nodes in categorical regions) as latitude/longitude points, necessary for input into geographic display files.===&lt;br /&gt;
&lt;br /&gt;
* Regarding where to put reconstructed nodes, or tips that where the only location information is region.  Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder.    If there is only one node in a region the centroid or something similar could be used (i.e. the &amp;quot;root&amp;quot; of the polygon skeleton would deal even with weird concave polygons).  &lt;br /&gt;
* If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other.  This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.&lt;br /&gt;
* Function: get_polygon_skeleton – this is a standard operation: http://en.wikipedia.org/wiki/Straight_skeleton &lt;br /&gt;
* Function: assign_node_locations_in_region -- within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle&lt;br /&gt;
* Function: assign_node_locations_between_regions – connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)&lt;br /&gt;
&lt;br /&gt;
===July, week 4 and August, week 1: Write functions for converting the output from the above into graphical display formats, e.g. shapefiles for ArcGIS, KML files for Google Earth.===&lt;br /&gt;
* Function: write_history_to_shapefile -- write the biogeographic history to a shapefile&lt;br /&gt;
* Function: write_history_to_KML – write the biogeographic history to a KML file for input into Google Earth&lt;br /&gt;
&lt;br /&gt;
===August, week 2: Beta testing&lt;br /&gt;
&lt;br /&gt;
Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.&lt;br /&gt;
&lt;br /&gt;
===August, week 3: Wrapup===&lt;br /&gt;
&lt;br /&gt;
Assemble documentation, FAQ, project results writeup for Phyloinformatics Summer of Code.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/BioGeography</id>
		<title>BioGeography</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/BioGeography"/>
				<updated>2009-05-20T04:50:20Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: starter&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;BioGeography is a module under development by [[Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project.  It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The code currently lives at the nmatzke branch on [http://github.com/nmatzke/biopython/tree/master GitHub], and you can see a timeline and other info about ongoing development [http://github.com/nmatzke/biopython/tree/master here].&lt;br /&gt;
&lt;br /&gt;
'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/Active_projects</id>
		<title>Active projects</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/Active_projects"/>
				<updated>2009-05-20T04:46:37Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: /* Current projects */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page provides a central location to collect references to active projects. This is a good place to start if you are interested in contributing to Biopython and want to find larger projects in progress. For developers, use this to reference git branches or other projects which you will be working on for an extended period of time. Please keep it up to date as projects are finished and integrated into Biopython.&lt;br /&gt;
&lt;br /&gt;
== Current projects ==&lt;br /&gt;
&lt;br /&gt;
=== Population Genetics development ===&lt;br /&gt;
&lt;br /&gt;
Giovanni and Tiago are working on expanding population genetics code in Biopython. See the [[PopGen_dev|PopGen development page]] for more details.&lt;br /&gt;
&lt;br /&gt;
=== GFF parser ===&lt;br /&gt;
&lt;br /&gt;
Brad is working on a Biopython GFF parser. Source code is available from [http://github.com/chapmanb/bcbb/tree/master/gff git hub]. See blog posts on the [http://bcbio.wordpress.com/2009/03/08/initial-gff-parser-for-biopython/ initial implementation] and [http://bcbio.wordpress.com/2009/03/22/mapreduce-implementation-of-gff-parsing-for-biopython/ MapReduce parallel version].&lt;br /&gt;
&lt;br /&gt;
=== PhyloXML driver (GSoC) ===&lt;br /&gt;
&lt;br /&gt;
Eric is working on supporting the [http://www.phyloxml.org/ PhyloXML] format, as a [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798969 project] for Google Summer of Code 2009. Brad is mentoring this project. The code lives on a branch in [http://github.com/etal/biopython/tree/phyloxml GitHub], and you can see a timeline and other info about ongoing development [http://github.com/etal/biopython/tree/phyloxml/Bio/PhyloXML/ here]. The new module is being documented on this wiki as [[PhyloXML]].&lt;br /&gt;
&lt;br /&gt;
=== Biogeography (GSoC) ===&lt;br /&gt;
&lt;br /&gt;
[[Matzke|Nick]] is working on developing a Biogeography module for BioPython.  This work is funded by [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd].  The code currently lives at the nmatzke branch on [http://github.com/nmatzke/biopython/tree/master GitHub], and you can see a timeline and other info about ongoing development [http://github.com/nmatzke/biopython/tree/master here]. The new module is being documented on this wiki as [[BioGeography]].&lt;br /&gt;
&lt;br /&gt;
=== Open Enhancement Bugs ===&lt;br /&gt;
&lt;br /&gt;
This [http://bugzilla.open-bio.org/buglist.cgi?product=Biopython&amp;amp;bug_status=NEW&amp;amp;bug_status=ASSIGNED&amp;amp;bug_status=REOPENED&amp;amp;bug_severity=enhancement Bugzilla Search] will list all open enhancement bugs (any filed by core developers are fairly likely to be integrated, some are just wish list entries).&lt;br /&gt;
&lt;br /&gt;
== Project ideas ==&lt;br /&gt;
&lt;br /&gt;
Please add any ideas or proposals for new additions to Biopython. Bugs and enhancements for current code should be discussed though our bugzilla interface.&lt;br /&gt;
&lt;br /&gt;
* Use SQLAlchemy, an object relational mapper, for BioSQL internals. This would add an additional external dependency to Biopython, but provides ready support for additional databases like SQLite. It also would provide a raw object interface to BioSQL databases when the SeqRecord-like interface is not sufficient. Brad has some initial code for this.&lt;br /&gt;
&lt;br /&gt;
* Revamp the GEO SOFT parser, drawing on the ideas used in [http://www.bioconductor.org/packages/bioc/html/GEOquery.html Sean Davis' GEOquery parser in R/Bioconductor].  See also [http://www.warwick.ac.uk/go/peter_cock/r/geo/ this page].&lt;br /&gt;
&lt;br /&gt;
* Roche 454 SFF support in Bio.SeqIO is being discussed on the mailing lists.&lt;br /&gt;
&lt;br /&gt;
== Enhancement list ==&lt;br /&gt;
&lt;br /&gt;
Maintaining software involves incremental improvements for new format changes and removal of bugs. Please see our [http://bugzilla.open-bio.org/ bugzilla] page for a current list. Post to the developer mailing list if you are interested in tackling any open issues.&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/User:Matzke</id>
		<title>User:Matzke</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/User:Matzke"/>
				<updated>2009-05-20T03:58:39Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Nick Matzke is working on the [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Google/Phyloinformatics Summer of Code 2009] project &amp;quot;[https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009#Biogeographical_Phylogenetics_for_BioPython Biogeographical Phylogenetics for BioPython].&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Nick is a Ph.D. candidate in the Department of Integrative Biology at University of California, Berkeley. [http://ib.berkeley.edu/people/students/person_detail.php?person=370 Departmental page] -- [http://fisher.berkeley.edu/cteg/members/matzke.html Lab page].&lt;br /&gt;
&lt;br /&gt;
He has also done some other strange and interesting things.  See [http://en.wikipedia.org/wiki/Nick_Matzke|wikipedia] and [http://www.google.com/search?hl=en&amp;amp;rlz=1B3GGGL_enUS239US239&amp;amp;q=evolution+matzke&amp;amp;btnG=Search|google].&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	<entry>
		<id>http://biopython.org/wiki/User:Matzke</id>
		<title>User:Matzke</title>
		<link rel="alternate" type="text/html" href="http://biopython.org/wiki/User:Matzke"/>
				<updated>2009-05-20T03:57:56Z</updated>
		
		<summary type="html">&lt;p&gt;Matzke: starter&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Nick Matzke is working on the [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Google/Phyloinformatics Summer of Code 2009] project &amp;quot;[https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009#Biogeographical_Phylogenetics_for_BioPython Biogeographical Phylogenetics for BioPython].&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Nick is a Ph.D. candidate in the Department of Integrative Biology at University of California, Berkeley. [http://ib.berkeley.edu/people/students/person_detail.php?person=370|Departmental page] -- [http://fisher.berkeley.edu/cteg/members/matzke.html|Lab page].&lt;br /&gt;
&lt;br /&gt;
He has also done some other strange and interesting things.  See [http://en.wikipedia.org/wiki/Nick_Matzke|wikipedia] and [http://www.google.com/search?hl=en&amp;amp;rlz=1B3GGGL_enUS239US239&amp;amp;q=evolution+matzke&amp;amp;btnG=Search|google].&lt;/div&gt;</summary>
		<author><name>Matzke</name></author>	</entry>

	</feed>