Bio.SeqIO.SeqXmlIO module

Bio.SeqIO support for the “seqxml” file format, SeqXML.

This module is for reading and writing SeqXML format files as SeqRecord objects, and is expected to be used via the Bio.SeqIO API.

SeqXML is a lightweight XML format which is supposed be an alternative for FASTA files. For more Information see http://www.seqXML.org and Schmitt et al (2011), https://doi.org/10.1093/bib/bbr025

class Bio.SeqIO.SeqXmlIO.ContentHandler

Bases: xml.sax.handler.ContentHandler

Handles XML events generated by the parser (PRIVATE).

__init__()

Create a handler to handle XML events.

startDocument()

Set XML handlers when an XML declaration is found.

startSeqXMLElement(name, qname, attrs)

Handle start of a seqXML element.

endSeqXMLElement(name, qname)

Handle end of the seqXML element.

startEntryElement(name, qname, attrs)

Set new entry with id and the optional entry source (PRIVATE).

endEntryElement(name, qname)

Handle end of an entry element.

startEntryFieldElement(name, qname, attrs)

Receive a field of an entry element and forward it.

startSpeciesElement(attrs)

Parse the species information.

endSpeciesElement(name, qname)

Handle end of a species element.

startDescriptionElement(attrs)

Parse the description.

endDescriptionElement(name, qname)

Handle the end of a description element.

startSequenceElement(attrs)

Parse DNA, RNA, or protein sequence.

endSequenceElement(name, qname)

Handle the end of a sequence element.

startDBRefElement(attrs)

Parse a database cross reference.

endDBRefElement(name, qname)

Handle the end of a DBRef element.

startPropertyElement(attrs)

Handle the start of a property element.

endPropertyElement(name, qname)

Handle the end of a property element.

characters(data)

Handle character data.

class Bio.SeqIO.SeqXmlIO.SeqXmlIterator(stream_or_path, namespace=None)

Bases: Bio.SeqIO.Interfaces.SequenceIterator

Parser for seqXML files.

Parses seqXML files and creates SeqRecords. Assumes valid seqXML please validate beforehand. It is assumed that all information for one record can be found within a record element or above. Two types of methods are called when the start tag of an element is reached. To receive only the attributes of an element before its end tag is reached implement _attr_TAGNAME. To get an element and its children as a DOM tree implement _elem_TAGNAME. Everything that is part of the DOM tree will not trigger any further method calls.

BLOCK = 1024
__init__(stream_or_path, namespace=None)

Create the object and initialize the XML parser.

parse(handle)

Start parsing the file, and return a SeqRecord generator.

iterate(handle)

Iterate over the records in the XML file.

__abstractmethods__ = frozenset({})
class Bio.SeqIO.SeqXmlIO.SeqXmlWriter(target, source=None, source_version=None, species=None, ncbiTaxId=None)

Bases: Bio.SeqIO.Interfaces.SequenceWriter

Writes SeqRecords into seqXML file.

SeqXML requires the SeqRecord annotations to specify the molecule_type; the molecule type is required to contain the term “DNA”, “RNA”, or “protein”.

__init__(target, source=None, source_version=None, species=None, ncbiTaxId=None)

Create Object and start the xml generator.

Arguments:
  • target - Output stream opened in binary mode, or a path to a file.

  • source - The source program/database of the file, for example UniProt.

  • source_version - The version or release number of the source program or database from which the data originated.

  • species - The scientific name of the species of origin of all entries in the file.

  • ncbiTaxId - The NCBI taxonomy identifier of the species of origin.

write_header()

Write root node with document metadata.

write_record(record)

Write one record.

Close the root node and finish the XML document.