Package Bio :: Package Entrez
[hide private]
[frames] | no frames]

Package Entrez

source code

Provides code to access NCBI over the WWW.

The main Entrez web page is available at: http://www.ncbi.nlm.nih.gov/Entrez/

Entrez Programming Utilities web page is available at: http://www.ncbi.nlm.nih.gov/books/NBK25501/

This module provides a number of functions like efetch (short for Entrez Fetch) which will return the data as a handle object. This is a standard interface used in Python for reading data from a file, or in this case a remote network connection, and provides methods like .read() or offers iteration over the contents line by line. See also "What the heck is a handle?" in the Biopython Tutorial and Cookbook: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Unlike a handle to a file on disk from the open(filename) function, which has a .name attribute giving the filename, the handles from Bio.Entrez all have a .url attribute instead giving the URL used to connect to the NCBI Entrez API.

The Entrez module also provides an XML parser which takes a handle as input.

Variables:

Functions:

Submodules [hide private]

Functions [hide private]
 
epost(db, **keywds)
Post a file of identifiers for future use.
source code
 
efetch(db, **keywords)
Fetches Entrez results which are returned as a handle.
source code
 
esearch(db, term, **keywds)
ESearch runs an Entrez search and returns a handle to the results.
source code
 
elink(**keywds)
ELink checks for linked external articles and returns a handle.
source code
 
einfo(**keywds)
EInfo returns a summary of the Entez databases as a results handle.
source code
 
esummary(**keywds)
ESummary retrieves document summaries as a results handle.
source code
 
egquery(**keywds)
EGQuery provides Entrez database counts for a global search.
source code
 
espell(**keywds)
ESpell retrieves spelling suggestions, returned in a results handle.
source code
 
_update_ecitmatch_variables(keywds) source code
 
ecitmatch(**keywds)
ECitMatch retrieves PMIDs-Citation linking
source code
 
read(handle, validate=True)
Parses an XML file from the NCBI Entrez Utilities into python objects.
source code
 
parse(handle, validate=True)
Parses an XML file from the NCBI Entrez Utilities into python objects.
source code
 
_open(cgi, params=None, post=None, ecitmatch=False)
Helper function to build the URL and open a handle to it (PRIVATE).
source code
 
_construct_params(params) source code
 
_encode_options(ecitmatch, params) source code
 
_construct_cgi(cgi, post, options) source code
 
_test()
Run the module's doctests (PRIVATE).
source code
Variables [hide private]
  email = None
hash(x)
  tool = 'biopython'
  __package__ = 'Bio.Entrez'
Function Details [hide private]

epost(db, **keywds)

source code 

Post a file of identifiers for future use.

Posts a file containing a list of UIs for future use in the user's environment to use with subsequent search strategies.

See the online documentation for an explanation of the parameters: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EPost

Return a handle to the results.

Raises an IOError exception if there's a network error.

efetch(db, **keywords)

source code 

Fetches Entrez results which are returned as a handle.

EFetch retrieves records in the requested format from a list of one or more UIs or from user's environment.

See the online documentation for an explanation of the parameters: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch

Return a handle to the results.

Raises an IOError exception if there's a network error.

Short example:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> handle = Entrez.efetch(db="nucleotide", id="57240072", rettype="gb", retmode="text")
>>> print(handle.readline().strip())
LOCUS       AY851612                 892 bp    DNA     linear   PLN 10-APR-2007
>>> handle.close()

This will automatically use an HTTP POST rather than HTTP GET if there are over 200 identifiers as recommended by the NCBI.

Warning: The NCBI changed the default retmode in Feb 2012, so many databases which previously returned text output now give XML.

esearch(db, term, **keywds)

source code 

ESearch runs an Entrez search and returns a handle to the results.

ESearch searches and retrieves primary IDs (for use in EFetch, ELink and ESummary) and term translations, and optionally retains results for future use in the user's environment.

See the online documentation for an explanation of the parameters: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch

Return a handle to the results which are always in XML format.

Raises an IOError exception if there's a network error.

Short example:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> handle = Entrez.esearch(db="nucleotide", retmax=10, term="opuntia[ORGN] accD")
>>> record = Entrez.read(handle)
>>> handle.close()
>>> record["Count"] >= 2
True
>>> "156535671" in record["IdList"]
True
>>> "156535673" in record["IdList"]
True

elink(**keywds)

source code 

ELink checks for linked external articles and returns a handle.

ELink checks for the existence of an external or Related Articles link from a list of one or more primary IDs; retrieves IDs and relevancy scores for links to Entrez databases or Related Articles; creates a hyperlink to the primary LinkOut provider for a specific ID and database, or lists LinkOut URLs and attributes for multiple IDs.

See the online documentation for an explanation of the parameters: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ELink

Return a handle to the results, by default in XML format.

Raises an IOError exception if there's a network error.

This example finds articles related to the Biopython application note's entry in the PubMed database:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> pmid = "19304878"
>>> handle = Entrez.elink(dbfrom="pubmed", id=pmid, linkname="pubmed_pubmed")
>>> record = Entrez.read(handle)
>>> handle.close()
>>> print(record[0]["LinkSetDb"][0]["LinkName"])
pubmed_pubmed
>>> linked = [link["Id"] for link in record[0]["LinkSetDb"][0]["Link"]]
>>> "17121776" in linked
True

This is explained in much more detail in the Biopython Tutorial.

einfo(**keywds)

source code 

EInfo returns a summary of the Entez databases as a results handle.

EInfo provides field names, index term counts, last update, and available links for each Entrez database.

See the online documentation for an explanation of the parameters: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EInfo

Return a handle to the results, by default in XML format.

Raises an IOError exception if there's a network error.

Short example:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> record = Entrez.read(Entrez.einfo())
>>> 'pubmed' in record['DbList']
True

esummary(**keywds)

source code 

ESummary retrieves document summaries as a results handle.

ESummary retrieves document summaries from a list of primary IDs or from the user's environment.

See the online documentation for an explanation of the parameters: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESummary

Return a handle to the results, by default in XML format.

Raises an IOError exception if there's a network error.

This example discovers more about entry 30367 in the journals database:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> handle = Entrez.esummary(db="journals", id="30367")
>>> record = Entrez.read(handle)
>>> handle.close()
>>> print(record[0]["Id"])
30367
>>> print(record[0]["Title"])
Computational biology and chemistry

egquery(**keywds)

source code 

EGQuery provides Entrez database counts for a global search.

EGQuery provides Entrez database counts in XML for a single search using Global Query.

See the online documentation for an explanation of the parameters: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EGQuery

Return a handle to the results in XML format.

Raises an IOError exception if there's a network error.

This quick example based on a longer version from the Biopython Tutorial just checks there are over 60 matches for 'Biopython' in PubMedCentral:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> handle = Entrez.egquery(term="biopython")
>>> record = Entrez.read(handle)
>>> handle.close()
>>> for row in record["eGQueryResult"]:
...     if "pmc" in row["DbName"]:
...         print(row["Count"] > 60)
True

espell(**keywds)

source code 

ESpell retrieves spelling suggestions, returned in a results handle.

ESpell retrieves spelling suggestions, if available.

See the online documentation for an explanation of the parameters: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESpell

Return a handle to the results, by default in XML format.

Raises an IOError exception if there's a network error.

Short example:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> record = Entrez.read(Entrez.espell(term="biopythooon"))
>>> print(record["Query"])
biopythooon
>>> print(record["CorrectedQuery"])
biopython

ecitmatch(**keywds)

source code 

ECitMatch retrieves PMIDs-Citation linking

ECitMatch retrieves PubMed IDs (PMIDs) that correspond to a set of input citation strings.

See the online documentation for an explanation of the parameters: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ECitMatch

Return a handle to the results, by default in plain text

Raises an IOError exception if there's a network error.

Short example:

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here@example.org"
>>> citation_1 = {
...    "journal_title": "proc natl acad sci u s a",
...    "year": "1991", "volume": "88", "first_page": "3248",
...    "author_name": "mann bj", "key": "citation_1"}
>>> record = Entrez.ecitmatch(db="pubmed", bdata=[citation_1])
>>> print(record["Query"])

read(handle, validate=True)

source code 

Parses an XML file from the NCBI Entrez Utilities into python objects.

This function parses an XML file created by NCBI's Entrez Utilities, returning a multilevel data structure of Python lists and dictionaries. Most XML files returned by NCBI's Entrez Utilities can be parsed by this function, provided its DTD is available. Biopython includes the DTDs for most commonly used Entrez Utilities.

If validate is True (default), the parser will validate the XML file against the DTD, and raise an error if the XML file contains tags that are not represented in the DTD. If validate is False, the parser will simply skip such tags.

Whereas the data structure seems to consist of generic Python lists, dictionaries, strings, and so on, each of these is actually a class derived from the base type. This allows us to store the attributes (if any) of each element in a dictionary my_element.attributes, and the tag name in my_element.tag.

parse(handle, validate=True)

source code 

Parses an XML file from the NCBI Entrez Utilities into python objects.

This function parses an XML file created by NCBI's Entrez Utilities, returning a multilevel data structure of Python lists and dictionaries. This function is suitable for XML files that (in Python) can be represented as a list of individual records. Whereas 'read' reads the complete file and returns a single Python list, 'parse' is a generator function that returns the records one by one. This function is therefore particularly useful for parsing large files.

Most XML files returned by NCBI's Entrez Utilities can be parsed by this function, provided its DTD is available. Biopython includes the DTDs for most commonly used Entrez Utilities.

If validate is True (default), the parser will validate the XML file against the DTD, and raise an error if the XML file contains tags that are not represented in the DTD. If validate is False, the parser will simply skip such tags.

Whereas the data structure seems to consist of generic Python lists, dictionaries, strings, and so on, each of these is actually a class derived from the base type. This allows us to store the attributes (if any) of each element in a dictionary my_element.attributes, and the tag name in my_element.tag.

_open(cgi, params=None, post=None, ecitmatch=False)

source code 

Helper function to build the URL and open a handle to it (PRIVATE).

Open a handle to Entrez. cgi is the URL for the cgi script to access. params is a dictionary with the options to pass to it. Does some simple error checking, and will raise an IOError if it encounters one.

The arugment post should be a boolean to explicitly control if an HTTP POST should be used rather an HTTP GET based on the query length. By default (post=None), POST is used if the URL encoded paramters would be over 1000 characters long.

This function also enforces the "up to three queries per second rule" to avoid abusing the NCBI servers.