Bio.GenBank.Record module¶
Hold GenBank data in a straightforward format.
- Classes:
Record - All of the information in a GenBank record.
Reference - hold reference data for a record.
Feature - Hold the information in a Feature Table.
Qualifier - Qualifiers on a Feature.
17-MAR-2009: added support for WGS and WGS_SCAFLD lines. Ying Huang & Iddo Friedberg
-
class
Bio.GenBank.Record.
Record
¶ Bases:
object
Hold GenBank information in a format similar to the original record.
The Record class is meant to make data easy to get to when you are just interested in looking at GenBank data.
- Attributes:
locus - The name specified after the LOCUS keyword in the GenBank record. This may be the accession number, or a clone id or something else.
size - The size of the record.
residue_type - The type of residues making up the sequence in this record. Normally something like RNA, DNA or PROTEIN, but may be as esoteric as ‘ss-RNA circular’.
data_file_division - The division this record is stored under in GenBank (ie. PLN -> plants; PRI -> humans, primates; BCT -> bacteria…)
date - The date of submission of the record, in a form like ‘28-JUL-1998’
accession - list of all accession numbers for the sequence.
nid - Nucleotide identifier number.
pid - Proteint identifier number
version - The accession number + version (ie. AB01234.2)
db_source - Information about the database the record came from
gi - The NCBI gi identifier for the record.
keywords - A list of keywords related to the record.
segment - If the record is one of a series, this is info about which segment this record is (something like ‘1 of 6’).
source - The source of material where the sequence came from.
organism - The genus and species of the organism (ie. ‘Homo sapiens’)
taxonomy - A listing of the taxonomic classification of the organism, starting general and getting more specific.
references - A list of Reference objects.
comment - Text with any kind of comment about the record.
features - A listing of Features making up the feature table.
base_counts - A string with the counts of bases for the sequence.
origin - A string specifying info about the origin of the sequence.
sequence - A string with the sequence itself.
contig - A string of location information for a CONTIG in a RefSeq file
project - The genome sequencing project numbers (will be replaced by the dblink cross-references in 2009).
dblinks - The genome sequencing project number(s) and other links. (will replace the project information in 2009).
-
GB_LINE_LENGTH
= 79¶
-
GB_BASE_INDENT
= 12¶
-
GB_FEATURE_INDENT
= 21¶
-
GB_INTERNAL_INDENT
= 2¶
-
GB_OTHER_INTERNAL_INDENT
= 3¶
-
GB_FEATURE_INTERNAL_INDENT
= 5¶
-
GB_SEQUENCE_INDENT
= 9¶
-
BASE_FORMAT
= '%-12s'¶
-
INTERNAL_FORMAT
= ' %-10s'¶
-
OTHER_INTERNAL_FORMAT
= ' %-9s'¶
-
BASE_FEATURE_FORMAT
= '%-21s'¶
-
INTERNAL_FEATURE_FORMAT
= ' %-16s'¶
-
SEQUENCE_FORMAT
= '%9s'¶
-
__init__
(self)¶ Initialize.
-
__str__
(self)¶ Provide a GenBank formatted output option for a Record.
The objective of this is to provide an easy way to read in a GenBank record, modify it somehow, and then output it in ‘GenBank format.’ We are striving to make this work so that a parsed Record that is output using this function will look exactly like the original record.
Much of the output is based on format description info at:
-
class
Bio.GenBank.Record.
Reference
¶ Bases:
object
Hold information from a GenBank reference.
- Attributes:
number - The number of the reference in the listing of references.
bases - The bases in the sequence the reference refers to.
authors - String with all of the authors.
consrtm - Consortium the authors belong to.
title - The title of the reference.
journal - Information about the journal where the reference appeared.
medline_id - The medline id for the reference.
pubmed_id - The pubmed_id for the reference.
remark - Free-form remarks about the reference.
-
__init__
(self)¶ Initialize.
-
__str__
(self)¶ Convert the reference to a GenBank format string.
-
class
Bio.GenBank.Record.
Feature
(key='', location='')¶ Bases:
object
Hold information about a Feature in the Feature Table of GenBank record.
- Attributes:
key - The key name of the featue (ie. source)
location - The string specifying the location of the feature.
qualfiers - A list of Qualifier objects in the feature.
-
__init__
(self, key='', location='')¶ Initialize.
-
__repr__
(self)¶ Representation of the object for debugging or logging.
-
__str__
(self)¶ Return feature as a GenBank format string.
-
class
Bio.GenBank.Record.
Qualifier
(key='', value='')¶ Bases:
object
Hold information about a qualifier in a GenBank feature.
- Attributes:
key - The key name of the qualifier (ie. /organism=)
value - The value of the qualifier (“Dictyostelium discoideum”).
-
__init__
(self, key='', value='')¶ Initialize.
-
__repr__
(self)¶ Representation of the object for debugging or logging.
-
__str__
(self)¶ Return feature qualifier as a GenBank format string.