Package Bio :: Package SeqIO :: Module _index
[hide private]
[frames] | no frames]

Module _index

source code

Dictionary like indexing of sequence files (PRIVATE).

You are not expected to access this module, or any of its code, directly. This is all handled internally by the Bio.SeqIO.index(...) and index_db(...) functions which are the public interface for this functionality.

The basic idea is that we scan over a sequence file, looking for new record markers. We then try to extract the string that Bio.SeqIO.parse/read would use as the record id, ideally without actually parsing the full record. We then use a subclassed Python dictionary to record the file offset for the record start against the record id.

Note that this means full parsing is on demand, so any invalid or problem record may not trigger an exception until it is accessed. This is by design.

This means our dictionary like objects have in memory ALL the keys (all the record identifiers), which shouldn't be a problem even with second generation sequencing. If memory is an issue, the index_db(...) interface stores the keys and offsets in an SQLite database - which can be re-used to avoid re-indexing the file for use another time.

Classes [hide private]
  SeqFileRandomAccess
  SffRandomAccess
Random access to a Standard Flowgram Format (SFF) file.
  SffTrimedRandomAccess
  SequentialSeqFileRandomAccess
  GenBankRandomAccess
Indexed dictionary like access to a GenBank file.
  EmblRandomAccess
Indexed dictionary like access to an EMBL file.
  SwissRandomAccess
Random access to a SwissProt file.
  UniprotRandomAccess
Random access to a UniProt XML file.
  IntelliGeneticsRandomAccess
Random access to a IntelliGenetics file.
  TabRandomAccess
Random access to a simple tabbed file.
  FastqRandomAccess
Random access to a FASTQ file (any supported variant).
Variables [hide private]
  _FormatToRandomAccess = {'ace': <class 'Bio.SeqIO._index.Seque...
  __package__ = 'Bio.SeqIO'
Variables Details [hide private]

_FormatToRandomAccess

Value:
{'ace': <class 'Bio.SeqIO._index.SequentialSeqFileRandomAccess'>,
 'embl': <class 'Bio.SeqIO._index.EmblRandomAccess'>,
 'fasta': <class 'Bio.SeqIO._index.SequentialSeqFileRandomAccess'>,
 'fastq': <class 'Bio.SeqIO._index.FastqRandomAccess'>,
 'fastq-illumina': <class 'Bio.SeqIO._index.FastqRandomAccess'>,
 'fastq-sanger': <class 'Bio.SeqIO._index.FastqRandomAccess'>,
 'fastq-solexa': <class 'Bio.SeqIO._index.FastqRandomAccess'>,
 'gb': <class 'Bio.SeqIO._index.GenBankRandomAccess'>,
...