Bio.SeqIO.TwoBitIO module

Bio.SeqIO support for UCSC’s “twoBit” (.2bit) file format.

This parser reads the index stored in the twoBit file, as well as the masked regions and the N’s for each sequence. It also creates sequence data objects (_TwoBitSequenceData objects), which support only two methods: __len__ and __getitem__. The former will return the length of the sequence, while the latter returns the sequence (as a bytes object) for the requested region.

Using the information in the index, the __getitem__ method calculates the file position at which the requested region starts, and only reads the requested sequence region. Note that the full sequence of a record is loaded only if specifically requested, making the parser memory-efficient.

The TwoBitIterator object implements the __getitem__, keys, and __len__ methods that allow it to be used as a dictionary.

class Bio.SeqIO.TwoBitIO.TwoBitIterator(source)

Bases: SequenceIterator

Parser for UCSC twoBit (.2bit) files.

__init__(source)

Read the file index.

parse(stream)

Iterate over the sequences in the file.

__getitem__(name)

Return sequence associated with given name as a SeqRecord object.

keys()

Return a list with the names of the sequences in the file.

__len__()

Return number of sequences.

__abstractmethods__ = frozenset({})
__parameters__ = ()