Bio.SeqIO.SeqXmlIO module

Bio.SeqIO support for the “seqxml” file format, SeqXML.

This module is for reading and writing SeqXML format files as SeqRecord objects, and is expected to be used via the Bio.SeqIO API.

SeqXML is a lightweight XML format which is supposed be an alternative for FASTA files. For more Information see http://www.seqXML.org and Schmitt et al (2011), https://doi.org/10.1093/bib/bbr025

class Bio.SeqIO.SeqXmlIO.ContentHandler

Bases: xml.sax.handler.ContentHandler

Handles XML events generated by the parser (PRIVATE).

__init__(self)

Create a handler to handle XML events.

startDocument(self)

Set XML handlers when an XML declaration is found.

startSeqXMLElement(self, name, qname, attrs)

Handle start of a seqXML element.

endSeqXMLElement(self, name, qname)

Handle end of the seqXML element.

startEntryElement(self, name, qname, attrs)

Set new entry with id and the optional entry source (PRIVATE).

endEntryElement(self, name, qname)

Handle end of an entry element.

startEntryFieldElement(self, name, qname, attrs)

Receive a field of an entry element and forward it.

startSpeciesElement(self, attrs)

Parse the species information.

endSpeciesElement(self, name, qname)

Handle end of a species element.

startDescriptionElement(self, attrs)

Parse the description.

endDescriptionElement(self, name, qname)

Handle the end of a description element.

startSequenceElement(self, attrs)

Parse DNA, RNA, or protein sequence.

endSequenceElement(self, name, qname)

Handle the end of a sequence element.

startDBRefElement(self, attrs)

Parse a database cross reference.

endDBRefElement(self, name, qname)

Handle the end of a DBRef element.

startPropertyElement(self, attrs)

Handle the start of a property element.

endPropertyElement(self, name, qname)

Handle the end of a property element.

characters(self, data)

Handle character data.

class Bio.SeqIO.SeqXmlIO.SeqXmlIterator(stream_or_path, namespace=None)

Bases: object

Breaks seqXML file into SeqRecords.

Assumes valid seqXML please validate beforehand. It is assumed that all information for one record can be found within a record element or above. Two types of methods are called when the start tag of an element is reached. To receive only the attributes of an element before its end tag is reached implement _attr_TAGNAME. To get an element and its children as a DOM tree implement _elem_TAGNAME. Everything that is part of the DOM tree will not trigger any further method calls.

BLOCK = 1024
__init__(self, stream_or_path, namespace=None)

Create the object and initialize the XML parser.

__iter__(self)
__next__(self)

Iterate over the records in the XML file.

class Bio.SeqIO.SeqXmlIO.SeqXmlWriter(handle, source=None, source_version=None, species=None, ncbiTaxId=None)

Bases: Bio.SeqIO.Interfaces.SequentialSequenceWriter

Writes SeqRecords into seqXML file.

SeqXML requires the sequence alphabet be explicitly RNA, DNA or protein, i.e. an instance or subclass of Bio.Alphapet.RNAAlphabet, Bio.Alphapet.DNAAlphabet or Bio.Alphapet.ProteinAlphabet.

__init__(self, handle, source=None, source_version=None, species=None, ncbiTaxId=None)

Create Object and start the xml generator.

write_header(self)

Write root node with document metadata.

write_record(self, record)

Write one record.

Close the root node and finish the XML document.