Bio.Blast.NCBIXML module

Code to work with the BLAST XML output.

The BLAST XML DTD file is on the NCBI FTP site at: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/xml/NCBI_BlastOutput.dtd

class Bio.Blast.NCBIXML.BlastParser(debug=0)

Bases: Bio.Blast.NCBIXML._XMLparser

Parse XML BLAST data into a Record.Blast object.

Parses XML output from BLAST (direct use discouraged). This (now) returns a list of Blast records. Historically it returned a single Blast record. You are expected to use this via the parse or read functions.

All XML ‘action’ methods are private methods and may be:

  • _start_TAG called when the start tag is found

  • _end_TAG called when the end tag is found

__init__(debug=0)

Initialize the parser.

Arguments:
  • debug - integer, amount of debug information to print

reset()

Reset all the data allowing reuse of the BlastParser() object.

set_hit_id()

Record the identifier of the database sequence (PRIVATE).

set_hit_def()

Record the definition line of the database sequence (PRIVATE).

set_hit_accession()

Record the accession value of the database sequence (PRIVATE).

set_hit_len()

Record the length of the hit.

Bio.Blast.NCBIXML.read(handle, debug=0)

Return a single Blast record (assumes just one query).

Uses the BlastParser internally.

This function is for use when there is one and only one BLAST result in your XML file.

Use the Bio.Blast.NCBIXML.parse() function if you expect more than one BLAST record (i.e. if you have more than one query sequence).

Bio.Blast.NCBIXML.parse(handle, debug=0)

Return an iterator a Blast record for each query.

Incremental parser, this is an iterator that returns Blast records. It uses the BlastParser internally.

handle - file handle to and XML file to parse debug - integer, amount of debug information to print

This is a generator function that returns multiple Blast records objects - one for each query sequence given to blast. The file is read incrementally, returning complete records as they are read in.

Should cope with new BLAST 2.2.14+ which gives a single XML file for multiple query records.

Should also cope with XML output from older versions BLAST which gave multiple XML files concatenated together (giving a single file which strictly speaking wasn’t valid XML).