Biopython-corba How-To

Brad Chapman (chapmanb@arches.uga.edu)

Last Update--23 September 2001

1   Introduction

This tutorial is designed to get you up and running quickly with biopython-corba. Biopython-corba does all of the following good stuff:

2   Installation

2.1   Requirements

In order to use biopython-corba, you need to install Biopython and a CORBA ORB implementation.

Biopython is available from (http://www.biopython.org). Currently, you need a current version of CVS to run biopython-corba, which you can check out with instructions from http://cvs.biopython.org. Once a new release becomes after 1.00a3 becomes available, you can get one of those from the standard Download place (http://www.biopython.org/Download/).

You will also need an ORB implementation. Currently you can choose from one of the following:

  1. omniORB - Python bindings are available from http://www.uk.research.att.com/omniORB/omniORBpy/ and omniORB itself is available from http://www.uk.research.att.com/omniORB/index.html. omniORBpy is an excellent python ORB under continuous development. It is quite fast and interoperable. It is quite a large C++ program though, so it can take some effort and time to install.

  2. ORBit and ORBit-python -- ORBit (http://orbit-resource.sourceforge.net/) is the standard ORB for the Gnome project. If you run Gnome or a standard Linux distribution, it may already be installed. Try typing orbit-config --version. If you get back something like ORBit 0.5.8, then it's already installed and you just need the ORBit-python bindings, available from http://orbit-python.sault.org/. Currently, you need a CVS version of ORBit-python but 0.3.1 or later should work with biopython-corba. An addtional item that needs to be done is to get ORBit working in an interoperable manner. To do this, you need to create a .orbitrc file in your home directory with the line: ORBIIOPIPv4=1 in it.

  3. Fnorb -- Fnorb is an ORB implementation written almost entirely in python, and is available from http://www.fnorb.org/. It is slower than the other ORBs, but is relatively easy to install since it is all python. One catch is that Fnorb needs some patches to work with biopython-corba and with python versions later than 2.0. You will need to get and apply the patch at http://bioinformatics.org/bradstuff/bc/fnorb.patch for everything to work happily.

2.2   Getting biopython-corba installed

Now that you've got the requirements installed, we're ready to get this installed. Here we go:

  1. Get biopython-corba, which is available by anonymous cvs at biopython (see http://cvs.biopython.orgfor instructions), or via download at http://www.biopython.org/Download/. Presumedly you already have this though, if you are reading this :)

  2. Edit the configuration file. This file is located in BioCorba/biocorbaconfig.py, and contains the options available for building and using biopython-corba. Basically, you will just want to edit the variable orb_implementation to match the python ORB implementation you installed above.

  3. Change into the main biopython-corba directory and excecute, as root, the command python setup.py install. This will do all the necessary magic and install everything.

2.3   Testing the installation

There are a number of tests available in the Tests directory. This describes how to run 'em. Feel free to keep running them until you are satisfied that everything installed smoothly.

3   Getting Started

This section is designed to get you up and running quickly with biopython-corba. After reading this section, the goals are that you should:

  1. know how to make a Biopython Dictionary object (like a GenBank file) available as a python server

  2. Understand how to use a biopython-corba client to connect to a Dictionary-like object and deal with it.
This section assumes some basic familiarity with biopython and CORBA.

3.1   Setting up a BioCorba server from a Biopython Dictionary

The big advantage of the biopython-corba libraries are that they make it easy to take a biopython object and make it available through CORBA. This example shows how to take GenBank information represented as a Biopython Dictionary object, and turn it into a BioCorba BioSequenceCollection.

3.1.1   Setting up the biopython Dictionary

First, we need to create a Biopython Dictionary. Biopython Dictionary objects act like standard python dictionaries, except that they provide biological information. In this example, we are going to create a dictionary to a GenBank file of drought related proteins in Arabidopsis.

We do this in the exact same way it is done in biopython. First we index the GenBank file, then we set up a dictionary that will return SeqRecord objects for each GenBank item in the dictionary:

from Bio import GenBank

gb_file = "a_drought.gb"
index_file = "a_drought.idx"
GenBank.index_file(gb_file, index_file)
feature_parser = GenBank.FeatureParser()
gb_dictionary = GenBank.Dictionary(index_file, feature_parser) 
Now we've got a dictionary that is ready to be fed into a BioCorba server.

3.1.2   Making the Dictionary a BioCorba server

The biopython-corba server code does all of the hard work of taking the dictionary you give it and making it follow the BioCorba standards. All you need to do is follow the right way of presenting your objects. In BioCorba the standard database-like object is called a BioSequenceCollection. You can give a BioSequenceCollection any Dictionary-like object from Biopython (ie. Fasta dictionaries, etc) or any object of your own that behaves like a Dictionary (implements __getitem__ and keys). We give BioSequenceCollection our Dictionary object, the name of the database, a version and the description, and it does the rest.

Next, we need to make a reference to the server available. This reference is analagous to a web URL, except it is more complicated because it needs to specify the entire path to get to the host, port and object itself. In CORBA these references are called Interoperable Object References (IORs). In this example, we will write a stringified version of the IOR for the server to a file so a client can get it. In real life, you might put this IOR on a web site, e-mail it to a collaborator, or somehow get it to someone else.

Once the reference to the server is written out, all we need to do is get the server running so it can sit and wait for clients to connect to it. Whew, all that seemed like a lot of work, but the following small amount of code will get it done:

from BioCorba.Server.Seqcore.CorbaCollection import BioSequenceCollection

server = BioSequenceCollection(gb_dictionary, "Arabidopsis Drought Database", 
           1.0, "Putative Drought Resistance Genes in Arabidopsis")

print "Writing IOR to a file and starting the server..."
server.string_ior_to_file("a_drought.ior")
server.run()
Whoo hoo -- with just this little bit of code we've taken a GenBank file and made it available through the BioCorba interface. With the server you've set up, people can write code in any language to attach to this server and retrieve the information out of it. Isn't CORBA a great thing?

3.2   Using a BioCorba client though a Biopython-like interface

This section describes retrieving information through BioCorba. The recommended method to use the biopython-corba interfaces is through the Bio-like interfaces, which emulate the biopython interfaces. It's better to write your code against these interfaces for a couple of reasons. First, they're easier to learn since they resemble, as close as possible, biopython code. Second, these APIs will be more stable between biopython-corba releases. The underlying BioCorba interfaces are likely to change, and this has been especially true as we've been working to get the CORBA interfaces "right."

We'll demonstrate how to take a BioCorba BioSequenceCollection server and get information from it through a GenBank-like interface. This will show retrieving SeqRecord objects from a Dictionary-like interface, and getting features on these SeqRecords.

3.2.1   Getting the BioCorba server and connecting to it

The trickiest part of learning how to use BioCorba is understanding how to connect to a server. The biopython-corba interfaces try to make this job easier by makine it involve only 2 steps:

  1. Get a ``loader`` object that can be used to get connections with servers of a specific type. In our case, we are going to be creating a loader that will give us a BioSequenceCollection from a python based server.

  2. Use the ``loader`` object to get a client connection from a server, based on a server reference. As we mentioned before, the reference to a server is called an IOR, and can be located in a number of possible places (ie. web site, file). In this case we'll be connecting to our server from a reference in a file.
Now that we've said all of that, the code to get a client reference to a BioSequenceCollection should (hopefully) make sense:

from BioCorba.Client.BiocorbaConnect import PythonCorbaClient
from BioCorba.Client.Seqcore.CorbaCollection import BioSequenceCollection

ior_file = 'a_drought.ior'

client_loader = PythonCorbaClient(BioSequenceCollection)
client = client_loader.from_file_ior(ior_file)
Now we've got a BioSequenceCollection client connected to a server. This client uses the raw BioCorba interfaces, but we'd like to be able to use the Biopython-like interfaces. To do this, we will load up a GenBank Dictionary object in much the same way we do in the Biopython. The major difference is that instead of using a file handle to a GenBank file we will use our connected client. So, we can just think of the client/server connection as a way to get info out of a GenBank file. The code to get the GenBank dictionary should hopefully look familiar (except that the imports come from BioCorba.Bio instead of just Bio):

from BioCorba.Bio import GenBank
feature_parser = GenBank.FeatureParser()
gb_dict = GenBank.Dictionary(client, feature_parser)
Whoo hoo, now we've got our Biopython-like GenBank dictionary. As we'll see in the next section, we can use this exactly like we would a normal Biopython Dictionary object.

3.2.2   Using the Biopython-like interface

Now that we've got our GenBank-like object, we are ready to use it exactly like we would any Dictionary object from Biopython. We don't need to worry about CORBA or anything at all. The following example code, which should be familiar to anyone using the standard SeqRecord and SeqFeature objects from Biopython, will just look at the available accessions in the database, and then retrieve a record and look at it:

accessions = gb_dict.keys()
print "Accessions:", accessions

seq_record = gb_dict['BE038456']
print "name:", seq_record.name
print "id:", seq_record.id
print "description:", seq_record.description
print "First 50 bases:", seq_record.seq[:50]
print "First SeqFeature:"
print "\tType:", seq_record.features[0].type
print "\tLocation:", seq_record.features[0].location
print "\tQualifiers:", seq_record.features[0].qualifiers
This produces the following output:

ccessions: ['BE038008', 'BE038108', 'BE038455', 'BE038456', 'BE038835',
'BE844799']
name: BE038456
id: BE038456.1
description: AA17G06 AA Arabidopsis thaliana cDNA 5' similar to
drought-induced protein di19, mRNA sequence.
First 50 bases:
Seq('GCACGAGCTTCTTCTTCATCTATTCAGGATCGATCCGGCTTGAGATTTAT', RNAAlphabet())
First SeqFeature:
    Type: source
    Location: (0..708)
    Qualifiers: {'clone_lib': ['AA'], 'organism': ['Arabidopsis thaliana'], 
'dev_stage': ['12 weeks'], 'cultivar': ['Columbia'], 
'tissue_type': ['leaves, flowering plants'],
'db_xref': ['taxon:3702'], 'note': ['20 h 200mM NaCl']}
Nice. Now we're set up for both creating server objects and accessing them through a Biopython-like interface. Hopefully this Tutorial gave enough instruction to get you started exploring this further.

4   Other Useful Information

4.1   Learning more about Python and CORBA

Alan Robinson has written a very nice introduction to writing CORBA clients, which has a path on python. You can check this out on line at http://industry.ebi.ac.uk/~alan/CORBA/. The python section is also available in the Doc directory in pdf as IntroPythonClient.pdf.

I also wrote a similar paper which is meant as an introduction to writing python servers. It is available in the Doc directory in pdf format in the file IntroPythonServer.pdf.

4.2   The Standard Python Mapping

The mapping of IDL to python has been standardized, and documentation describing this official mapping is available (http://www.omg.org/cgi-bin/doc?ptc/00-04-08). Due to recent acceptance of this standard, the degree to which different python ORB implementations follow it vary, unfortunately. The biopython-corba package attempts to standardize "non-compliant" ORB implementations so that they can be used through the same interface. This makes it possible to use any supported ORB with the biopython-corba interfaces without needed to change any code.


This document was translated from LATEX by HEVEA.