Bio.Align.bigbed module

Bio.Align support for alignment files in the bigBed format.

The bigBed format stores a series of pairwise alignments in a single indexed binary file. Typically they are used for transcript to genome alignments. As in the BED format, the alignment positions and alignment scores are stored, but the aligned sequences are not.

See http://genome.ucsc.edu/goldenPath/help/bigBed.html for more information.

You are expected to use this module via the Bio.Align functions.

class Bio.Align.bigbed.Field(as_type, name, comment)

Bases: tuple

__getnewargs__(): Return self as a plain tuple. Used by copy and pickle.

__match_args__ = ('as_type', 'name', 'comment')

static __new__(_cls, as_type, name, comment): Create new instance of Field(as_type, name, comment)

__repr__(): Return a nicely formatted representation string

__slots__ = ()

as_type: Alias for field number 0

comment: Alias for field number 2

name: Alias for field number 1

class Bio.Align.bigbed.AutoSQLTable(name, comment, fields)

Bases: list

AutoSQL table describing the columns of an (possibly extended) BED format.

default: AutoSQLTable = [('string', 'chrom', 'Reference sequence chromosome or scaffold'), ('uint', 'chromStart', 'Start position in chromosome'), ('uint', 'chromEnd', 'End position in chromosome'), ('string', 'name', 'Name of item.'), ('uint', 'score', 'Score (0-1000)'), ('char[1]', 'strand', '+ or - for strand'), ('uint', 'thickStart', 'Start of where display should be thick (start codon)'), ('uint', 'thickEnd', 'End of where display should be thick (stop codon)'), ('uint', 'reserved', 'Used as itemRgb as of 2004-11-22'), ('int', 'blockCount', 'Number of blocks'), ('int[blockCount]', 'blockSizes', 'Comma separated list of block sizes'), ('int[blockCount]', 'chromStarts', 'Start positions relative to chromStart')]

__init__(name, comment, fields): Create an AutoSQL table describing the columns of an (extended) BED format.

classmethod from_bytes(data): Return an AutoSQLTable initialized using the bytes object data.

classmethod from_string(data): Return an AutoSQLTable initialized using the string object data.

__str__(): Return str(self).

__bytes__()

__getitem__(i): x.__getitem__(y) <==> x[y]

__annotations__ = {'default': 'AutoSQLTable'}

class Bio.Align.bigbed.AlignmentWriter(target, bedN=12, declaration=None, targets=None, compress=True, itemsPerSlot=512, blockSize=256, extraIndex=())

Bases: AlignmentWriter

Alignment file writer for the bigBed file format.

fmt: str | None = 'bigBed'

mode = 'b'

__init__(target, bedN=12, declaration=None, targets=None, compress=True, itemsPerSlot=512, blockSize=256, extraIndex=())

Create an AlignmentWriter object.

Arguments:

target - output stream or file name.
bedN - number of columns in the BED file.
This must be between 3 and 12; default value is 12.
declaration - an AutoSQLTable object declaring the fields in the
BED file. Required only if the BED file contains extra (custom) fields. Default value is None.
targets - A list of SeqRecord objects with the chromosomes in
the order as they appear in the alignments. The sequence contents in each SeqRecord may be undefined, but the sequence length must be defined, as in this example:

SeqRecord(Seq(None, length=248956422), id=”chr1”)

If targets is None (the default value), the alignments must have an attribute .targets providing the list of SeqRecord objects.
compress - If True (default), compress data using zlib.
If False, do not compress data. Use compress=False for faster searching.
blockSize - Number of items to bundle in r-tree.
See UCSC’s bedToBigBed program for more information. Default value is 256.
itemsPerSlot - Number of data points bundled at lowest level.
See UCSC’s bedToBigBed program for more information. Use itemsPerSlot=1 for faster searching. Default value is 512.
extraIndex - List of strings with the names of extra columns to be
indexed. Default value is an empty list.

write_file(stream, alignments)

Write the alignments to the file stream, and return the number of alignments.

alignments - A list or iterator returning Alignment objects stream - Output file stream.

write_alignments(alignments, output, reductions, extra_indices)

Write alignments to the output file, and return the number of alignments.

alignments - A list or iterator returning Alignment objects stream - Output file stream.

__abstractmethods__ = frozenset({})

__annotations__ = {'fmt': 'str | None'}

class Bio.Align.bigbed.AlignmentIterator(source)

Bases: AlignmentIterator

Alignment iterator for bigBed files.

The pairwise alignments stored in the bigBed file are loaded and returned incrementally. Additional alignment information is stored as attributes of each alignment.

fmt: str | None = 'bigBed'

mode = 'b'

__len__()

Return the number of alignments.

The number of alignments is cached. If not yet calculated, the iterator is rewound to the beginning, and the number of alignments is calculated by iterating over the alignments. The iterator is then returned to its original position in the file.

search(chromosome=None, start=None, end=None)

Iterate over alignments overlapping the specified chromosome region..

This method searches the index to find alignments to the specified chromosome that fully or partially overlap the chromosome region between start and end.

Arguments:

chromosome - chromosome name. If None (default value), include all alignments.
start - starting position on the chromosome. If None (default value), use 0 as the starting position.
end - end position on the chromosome. If None (default value), use the length of the chromosome as the end position.

__abstractmethods__ = frozenset({})

__annotations__ = {'fmt': 'str | None'}