Bio.GenBank package

Module contents

Code to work with GenBank formatted files.

Rather than using Bio.GenBank, you are now encouraged to use Bio.SeqIO with the “genbank” or “embl” format names to parse GenBank or EMBL files into SeqRecord and SeqFeature objects (see the Biopython tutorial for details).

Using Bio.GenBank directly to parse GenBank files is only useful if you want to obtain GenBank-specific Record objects, which is a much closer representation to the raw file contents than the SeqRecord alternative from the FeatureParser (used in Bio.SeqIO).

To use the Bio.GenBank parser, there are two helper functions:

  • read Parse a handle containing a single GenBank record as Bio.GenBank specific Record objects.

  • parse Iterate over a handle containing multiple GenBank records as Bio.GenBank specific Record objects.

The following internal classes are not intended for direct use and may be deprecated in a future release.

Classes:
  • Iterator Iterate through a file of GenBank entries

  • ErrorFeatureParser Catch errors caused during parsing.

  • FeatureParser Parse GenBank data in SeqRecord and SeqFeature objects.

  • RecordParser Parse GenBank data into a Record object.

Exceptions:
  • ParserFailureError Exception indicating a failure in the parser (ie. scanner or consumer)

  • LocationParserError Exception indicating a problem with the spark based location parser.

class Bio.GenBank.Iterator(handle, parser=None)

Bases: object

Iterator interface to move over a file of GenBank entries one at a time (OBSOLETE).

This class is likely to be deprecated in a future release of Biopython. Please use Bio.SeqIO.parse(…, format=”gb”) or Bio.GenBank.parse(…) for SeqRecord and GenBank specific Record objects respectively instead.

__init__(self, handle, parser=None)

Initialize the iterator.

Arguments:
  • handle - A handle with GenBank entries to iterate through.

  • parser - An optional parser to pass the entries through before returning them. If None, then the raw entry will be returned.

__next__(self)

Return the next GenBank record from the handle.

Will return None if we ran out of records.

__iter__(self)

Iterate over the records.

exception Bio.GenBank.ParserFailureError

Bases: Exception

Failure caused by some kind of problem in the parser.

exception Bio.GenBank.LocationParserError

Bases: Exception

Could not Properly parse out a location from a GenBank file.

class Bio.GenBank.FeatureParser(debug_level=0, use_fuzziness=1, feature_cleaner=<Bio.GenBank.utils.FeatureValueCleaner object>)

Bases: object

Parse GenBank files into Seq + Feature objects (OBSOLETE).

Direct use of this class is discouraged, and may be deprecated in a future release of Biopython.

Please use Bio.SeqIO.parse(…) or Bio.SeqIO.read(…) instead.

__init__(self, debug_level=0, use_fuzziness=1, feature_cleaner=<Bio.GenBank.utils.FeatureValueCleaner object at 0x00000255D72B0780>)

Initialize a GenBank parser and Feature consumer.

Arguments:
  • debug_level - An optional argument that species the amount of debugging information the parser should spit out. By default we have no debugging info (the fastest way to do things), but if you want you can set this as high as two and see exactly where a parse fails.

  • use_fuzziness - Specify whether or not to use fuzzy representations. The default is 1 (use fuzziness).

  • feature_cleaner - A class which will be used to clean out the values of features. This class must implement the function clean_value. GenBank.utils has a “standard” cleaner class, which is used by default.

parse(self, handle)

Parse the specified handle.

class Bio.GenBank.RecordParser(debug_level=0)

Bases: object

Parse GenBank files into Record objects (OBSOLETE).

Direct use of this class is discouraged, and may be deprecated in a future release of Biopython.

Please use the Bio.GenBank.parse(…) or Bio.GenBank.read(…) functions instead.

__init__(self, debug_level=0)

Initialize the parser.

Arguments:
  • debug_level - An optional argument that species the amount of debugging information the parser should spit out. By default we have no debugging info (the fastest way to do things), but if you want you can set this as high as two and see exactly where a parse fails.

parse(self, handle)

Parse the specified handle into a GenBank record.

Bio.GenBank.parse(handle)

Iterate over GenBank formatted entries as Record objects.

>>> from Bio import GenBank
>>> with open("GenBank/NC_000932.gb") as handle:
...     for record in GenBank.parse(handle):
...         print(record.accession)
['NC_000932']

To get SeqRecord objects use Bio.SeqIO.parse(…, format=”gb”) instead.

Bio.GenBank.read(handle)

Read a handle containing a single GenBank entry as a Record object.

>>> from Bio import GenBank
>>> with open("GenBank/NC_000932.gb") as handle:
...     record = GenBank.read(handle)
...     print(record.accession)
['NC_000932']

To get a SeqRecord object use Bio.SeqIO.read(…, format=”gb”) instead.