Package Bio :: Module File :: Class _IndexedSeqFileDict
[hide private]
[frames] | no frames]

Class _IndexedSeqFileDict

source code

UserDict.DictMixin --+
                     |
                    _IndexedSeqFileDict
Known Subclasses:

Read only dictionary interface to a sequential record file.

This code is used in both Bio.SeqIO for indexing as SeqRecord objects, and in Bio.SearchIO for indexing QueryResult objects.

Keeps the keys and associated file offsets in memory, reads the file to access entries as objects parsing them on demand. This approach is memory limited, but will work even with millions of records.

Note duplicate keys are not allowed. If this happens, a ValueError exception is raised.

As used in Bio.SeqIO, by default the SeqRecord's id string is used as the dictionary key. In Bio.SearchIO, the query's id string is used. This can be changed by suppling an optional key_function, a callback function which will be given the record id and must return the desired key. For example, this allows you to parse NCBI style FASTA identifiers, and extract the GI number to use as the dictionary key.

Note that this dictionary is essentially read only. You cannot add or change values, pop values, nor clear the dictionary.

Instance Methods [hide private]
 
__init__(self, random_access_proxy, key_function, repr, obj_repr)
Initialize the class.
source code
 
__repr__(self)
Return a string representation of the File object.
source code
 
__str__(self)
Create a string representation of the File object.
source code
 
__contains__(self, key)
Return key if contained in the offsets dictionary.
source code
 
__len__(self)
Return the number of records.
source code
 
items(self)
Iterate over the (key, SeqRecord) items.
source code
 
values(self)
Iterate over the SeqRecord items.
source code
 
keys(self)
Iterate over the keys.
source code
 
itervalues(self)
Iterate over the SeqRecord) items.
source code
 
iteritems(self)
Iterate over the (key, SeqRecord) items.
source code
 
iterkeys(self)
Iterate over the keys.
source code
 
__iter__(self)
Iterate over the keys.
source code
 
__getitem__(self, key)
Return record for the specified key.
source code
 
get(self, k, d=None)
Return the value in the dictionary.
source code
 
get_raw(self, key)
Return the raw record from the file as a bytes string.
source code
 
__setitem__(self, key, value)
Would allow setting or replacing records, but not implemented.
source code
 
update(self, *args, **kwargs)
Would allow adding more values, but not implemented.
source code
 
pop(self, key, default=None)
Would remove specified record, but not implemented.
source code
 
popitem(self)
Would remove and return a SeqRecord, but not implemented.
source code
 
clear(self)
Would clear dictionary, but not implemented.
source code
 
fromkeys(self, keys, value=None)
Would return a new dictionary with keys and values, but not implemented.
source code
 
copy(self)
Would copy a dictionary, but not implemented.
source code
 
close(self)
Close the file handle being used to read the data.
source code

Inherited from UserDict.DictMixin: __cmp__, has_key, setdefault

Method Details [hide private]

__repr__(self)
(Representation operator)

source code 
Return a string representation of the File object.
Overrides: UserDict.DictMixin.__repr__

__contains__(self, key)
(In operator)

source code 
Return key if contained in the offsets dictionary.
Overrides: UserDict.DictMixin.__contains__

__len__(self)
(Length operator)

source code 
Return the number of records.
Overrides: UserDict.DictMixin.__len__

items(self)

source code 

Iterate over the (key, SeqRecord) items.

This tries to act like a Python 3 dictionary, and does not return a list of (key, value) pairs due to memory concerns.

Overrides: UserDict.DictMixin.items

values(self)

source code 

Iterate over the SeqRecord items.

This tries to act like a Python 3 dictionary, and does not return a list of value due to memory concerns.

Overrides: UserDict.DictMixin.values

keys(self)

source code 

Iterate over the keys.

This tries to act like a Python 3 dictionary, and does not return a list of keys due to memory concerns.

itervalues(self)

source code 
Iterate over the SeqRecord) items.
Overrides: UserDict.DictMixin.itervalues

iteritems(self)

source code 
Iterate over the (key, SeqRecord) items.
Overrides: UserDict.DictMixin.iteritems

iterkeys(self)

source code 
Iterate over the keys.
Overrides: UserDict.DictMixin.iterkeys

__iter__(self)

source code 
Iterate over the keys.
Overrides: UserDict.DictMixin.__iter__

get(self, k, d=None)

source code 

Return the value in the dictionary.

If the key (k) is not found, this returns None unless a default (d) is specified.

Overrides: UserDict.DictMixin.get

get_raw(self, key)

source code 

Return the raw record from the file as a bytes string.

If the key is not found, a KeyError exception is raised.

__setitem__(self, key, value)
(Index assignment operator)

source code 

Would allow setting or replacing records, but not implemented.

Python dictionaries provide this method for modifying data in the dictionary. This class mimics the dictionary interface but is read only.

update(self, *args, **kwargs)

source code 

Would allow adding more values, but not implemented.

Python dictionaries provide this method for modifying data in the dictionary. This class mimics the dictionary interface but is read only.

Overrides: UserDict.DictMixin.update

pop(self, key, default=None)

source code 

Would remove specified record, but not implemented.

Python dictionaries provide this method for modifying data in the dictionary. This class mimics the dictionary interface but is read only.

Overrides: UserDict.DictMixin.pop

popitem(self)

source code 

Would remove and return a SeqRecord, but not implemented.

Python dictionaries provide this method for modifying data in the dictionary. This class mimics the dictionary interface but is read only.

Overrides: UserDict.DictMixin.popitem

clear(self)

source code 

Would clear dictionary, but not implemented.

Python dictionaries provide this method for modifying data in the dictionary. This class mimics the dictionary interface but is read only.

Overrides: UserDict.DictMixin.clear

fromkeys(self, keys, value=None)

source code 

Would return a new dictionary with keys and values, but not implemented.

Python dictionaries provide this method for modifying data in the dictionary. This class mimics the dictionary interface but is read only.

copy(self)

source code 

Would copy a dictionary, but not implemented.

Python dictionaries provide this method for modifying data in the dictionary. This class mimics the dictionary interface but is read only.

close(self)

source code 

Close the file handle being used to read the data.

Once called, further use of the index won't work. The sole purpose of this method is to allow explicit handle closure - for example if you wish to delete the file, on Windows you must first close all open handles to that file.