Bio.GenBank.Record module

Hold GenBank data in a straightforward format.

Classes:
  • Record - All of the information in a GenBank record.

  • Reference - hold reference data for a record.

  • Feature - Hold the information in a Feature Table.

  • Qualifier - Qualifiers on a Feature.

17-MAR-2009: added support for WGS and WGS_SCAFLD lines. Ying Huang & Iddo Friedberg

class Bio.GenBank.Record.Record

Bases: object

Hold GenBank information in a format similar to the original record.

The Record class is meant to make data easy to get to when you are just interested in looking at GenBank data.

Attributes:
  • locus - The name specified after the LOCUS keyword in the GenBank record. This may be the accession number, or a clone id or something else.

  • size - The size of the record.

  • residue_type - The type of residues making up the sequence in this record. Normally something like RNA, DNA or PROTEIN, but may be as esoteric as ‘ss-RNA circular’.

  • data_file_division - The division this record is stored under in GenBank (ie. PLN -> plants; PRI -> humans, primates; BCT -> bacteria…)

  • date - The date of submission of the record, in a form like ‘28-JUL-1998’

  • accession - list of all accession numbers for the sequence.

  • nid - Nucleotide identifier number.

  • pid - Proteint identifier number

  • version - The accession number + version (ie. AB01234.2)

  • db_source - Information about the database the record came from

  • gi - The NCBI gi identifier for the record.

  • keywords - A list of keywords related to the record.

  • segment - If the record is one of a series, this is info about which segment this record is (something like ‘1 of 6’).

  • source - The source of material where the sequence came from.

  • organism - The genus and species of the organism (ie. ‘Homo sapiens’)

  • taxonomy - A listing of the taxonomic classification of the organism, starting general and getting more specific.

  • references - A list of Reference objects.

  • comment - Text with any kind of comment about the record.

  • features - A listing of Features making up the feature table.

  • base_counts - A string with the counts of bases for the sequence.

  • origin - A string specifying info about the origin of the sequence.

  • sequence - A string with the sequence itself.

  • contig - A string of location information for a CONTIG in a RefSeq file

  • project - The genome sequencing project numbers (will be replaced by the dblink cross-references in 2009).

  • dblinks - The genome sequencing project number(s) and other links. (will replace the project information in 2009).

GB_LINE_LENGTH = 79
GB_BASE_INDENT = 12
GB_FEATURE_INDENT = 21
GB_INTERNAL_INDENT = 2
GB_OTHER_INTERNAL_INDENT = 3
GB_FEATURE_INTERNAL_INDENT = 5
GB_SEQUENCE_INDENT = 9
BASE_FORMAT = '%-12s'
INTERNAL_FORMAT = ' %-10s'
OTHER_INTERNAL_FORMAT = ' %-9s'
BASE_FEATURE_FORMAT = '%-21s'
INTERNAL_FEATURE_FORMAT = ' %-16s'
SEQUENCE_FORMAT = '%9s'
__init__(self)

Initialize.

__str__(self)

Provide a GenBank formatted output option for a Record.

The objective of this is to provide an easy way to read in a GenBank record, modify it somehow, and then output it in ‘GenBank format.’ We are striving to make this work so that a parsed Record that is output using this function will look exactly like the original record.

Much of the output is based on format description info at:

ftp://ncbi.nlm.nih.gov/genbank/gbrel.txt

class Bio.GenBank.Record.Reference

Bases: object

Hold information from a GenBank reference.

Attributes:
  • number - The number of the reference in the listing of references.

  • bases - The bases in the sequence the reference refers to.

  • authors - String with all of the authors.

  • consrtm - Consortium the authors belong to.

  • title - The title of the reference.

  • journal - Information about the journal where the reference appeared.

  • medline_id - The medline id for the reference.

  • pubmed_id - The pubmed_id for the reference.

  • remark - Free-form remarks about the reference.

__init__(self)

Initialize.

__str__(self)

Convert the reference to a GenBank format string.

class Bio.GenBank.Record.Feature(key='', location='')

Bases: object

Hold information about a Feature in the Feature Table of GenBank record.

Attributes:
  • key - The key name of the featue (ie. source)

  • location - The string specifying the location of the feature.

  • qualfiers - A list of Qualifier objects in the feature.

__init__(self, key='', location='')

Initialize.

__repr__(self)

Representation of the object for debugging or logging.

__str__(self)

Return feature as a GenBank format string.

class Bio.GenBank.Record.Qualifier(key='', value='')

Bases: object

Hold information about a qualifier in a GenBank feature.

Attributes:
  • key - The key name of the qualifier (ie. /organism=)

  • value - The value of the qualifier (“Dictyostelium discoideum”).

__init__(self, key='', value='')

Initialize.

__repr__(self)

Representation of the object for debugging or logging.

__str__(self)

Return feature qualifier as a GenBank format string.