GSoC2011 mtrellet

From Biopython
Revision as of 20:37, 7 June 2011 by Mtrellet (Talk | contribs)
Jump to: navigation, search


Author & Mentors

Mikael Trellet


Joao Rodrigues
Eric Talevich


Analysis of protein-protein complexes interfaces at a residue level yields significant information on the overall binding process. Such information can be broadly used for example in binding affinity studies, interface design, and enzymology. To tap into it, there is a need for tools that systematically and automatically analyze protein structures, or that provide means to this end. Protorop ( is an example of such a tool and the elevated number of citations the server has had since its publication acknowledge its importance. However, being a webserver, Protorop is not suited for large-scale analysis and it leaves the community dependent on its maintainers to keep the service available. On the other hand, Biopython’s structural biology module, Bio.PDB, provides the ideal parsing machinery and programmatic structures for the development of an offline, open-source library for interface analysis. Such a library could be easily used in large-scale analysis of protein-protein interfaces, for example in the CAPRI experiment evaluation or in benchmark statistics. It would be also reasonable, if time permits, to extend this module to deal with protein-DNA or protein-RNA complexes, as Biopython supports nucleic acids already.

Project Schedule

Week 1 [23rd May - 31st June]

Add the new module backbone in current Bio.PDB code base

  • Evaluate possible code reuse and call it into the new module
  • Try simple calculations to be sure that there is stability between the different modules (parsing for example) and functions

Define a stable benchmark

  • Select few PDB files among interface size and proteins size would be different
  • Add some basics unit tests

Weeks 2-3 [1st - 13th June]

Extend IUPAC.Data module with residue information

  • Deduce residues weight from Atom instead of direct dictionnary storage
  • Polar/charge character (dictionary or influenced by pH)
  • Hydrophobicity scale(s)

Weeks 4 [14th - 21st June]

Implement Extended Residue class as a subclass of Residue

  • Build Extended Residue on the fly or have it hard-coded (?)
  • Allow regular operations on Residue to be performed seamlessly in Extended Residue (should come with inheritance)
  • Unit tests on pdb files containing particular residues

Weeks 5-6-7 [22nd June - 11th July]

Implement InterfaceAnalysis module

  • Develop Interface class as a subclass of Model
  • Develop method to automatically extract Interface from parsed structure upon class instantiation
    • e.g. I = Interface(Structure)
    • Allow threshold for distance
    • Allow chain pairs to ignore (to avoid intra-molecule contacts)
  • Unit tests with results from usual scripts, broadly used by scientists

Mid term evaluation

Weeks 7-8 [12th July - 25th July]

Develop functions for interface analysis

  • Calculation of interface polar character statistics (% of polar residues, apolar, etc)
  • Calculation of BSA calling MSMS or HSA
  • Calculation of SS element statistics in the interface through DSSP
  • ...
  • Unit tests and use of results as input for further calculations by other tools and scripts

Weeks 9-10 [26th July - 8th August]

Develop functions for Interface comparison

  • Perhaps adapt current RMSD functions to allow usage of Interface Residues
  • Otherwise, should be called through something like Ia.rmsd_to(Ib) where Ia and IB are interface objects
  • Calculation of iRMSD
  • Calculation of FCC (Fraction of Common Contacts)
  • Rough Identity and Similarity percentage
  • ...
  • Unit tests, comparison with specific tools as Profit

Weeks 11 [9th July - 8th August]

Code organization and final testing

Unit tests will be perfomed along the project, allowing to do only a larger test at the end gathering every tests already performed.

Then the aim will be to optimized, if possible, some parts of the code in efficiency and rapidity without changes at algorithmic level. Several days will be booked to package code and be sure that everything can communicate with Biopython.

Personal tools