Package Bio :: Module bgzf :: Class BgzfReader
[hide private]
[frames] | no frames]

Class BgzfReader

source code

object --+

BGZF reader, acts like a read only handle but seek/tell differ.

Let's use the BgzfBlocks function to have a peak at the BGZF blocks in an example BAM file,

>>> try:
...     from __builtin__ import open # Python 2
... except ImportError:
...     from builtins import open # Python 3
>>> handle = open("SamBam/ex1.bam", "rb")
>>> for values in BgzfBlocks(handle):
...     print("Raw start %i, raw length %i; data start %i, data length %i" % values)
Raw start 0, raw length 18239; data start 0, data length 65536
Raw start 18239, raw length 18223; data start 65536, data length 65536
Raw start 36462, raw length 18017; data start 131072, data length 65536
Raw start 54479, raw length 17342; data start 196608, data length 65536
Raw start 71821, raw length 17715; data start 262144, data length 65536
Raw start 89536, raw length 17728; data start 327680, data length 65536
Raw start 107264, raw length 17292; data start 393216, data length 63398
Raw start 124556, raw length 28; data start 456614, data length 0
>>> handle.close()

Now let's see how to use this block information to jump to specific parts of the decompressed BAM file:

>>> handle = BgzfReader("SamBam/ex1.bam", "rb")
>>> assert 0 == handle.tell()
>>> magic =
>>> assert 4 == handle.tell()

So far nothing so strange, we got the magic marker used at the start of a decompressed BAM file, and the handle position makes sense. Now however, let's jump to the end of this block and 4 bytes into the next block by reading 65536 bytes,

>>> data =
>>> len(data)
>>> assert 1195311108 == handle.tell()

Expecting 4 + 65536 = 65540 were you? Well this is a BGZF 64-bit virtual offset, which means:

>>> split_virtual_offset(1195311108)
(18239, 4)

You should spot 18239 as the start of the second BGZF block, while the 4 is the offset into this block. See also make_virtual_offset,

>>> make_virtual_offset(18239, 4)

Let's jump back to almost the start of the file,

>>> make_virtual_offset(0, 2)
>>> handle.close()

Note that you can use the max_cache argument to limit the number of BGZF blocks cached in memory. The default is 100, and since each block can be up to 64kb, the default cache could take up to 6MB of RAM. The cache is not important for reading through the file in one pass, but is important for improving performance of random access.

Instance Methods [hide private]
__init__(self, filename=None, mode='r', fileobj=None, max_cache=100)
Initialize the class.
source code
_load_block(self, start_offset=None) source code
Return a 64-bit unsigned BGZF virtual offset.
source code
seek(self, virtual_offset)
Seek to a 64-bit unsigned BGZF virtual offset.
source code
read(self, size=-1)
Read method for the BGZF module.
source code
Read a single line for the BGZF file.
source code
Return the next line.
source code
Python 2 style alias for Python 3 style __next__ method.
source code
Iterate over the lines in the BGZF file.
source code
Close BGZF file.
source code
Return True indicating the BGZF supports random access.
source code
Return True if connected to a TTY device.
source code
Return integer file descriptor.
source code
Open a file operable with WITH statement.
source code
__exit__(self, type, value, traceback)
Close a file with WITH statement.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, filename=None, mode='r', fileobj=None, max_cache=100)

source code 
Initialize the class.
Overrides: object.__init__