Package Bio :: Package SeqIO :: Module InsdcIO
[hide private]
[frames] | no frames]

Module InsdcIO

source code

Bio.SeqIO support for the "genbank" and "embl" file formats.

You are expected to use this module via the Bio.SeqIO functions.
Note that internally this module calls Bio.GenBank to do the actual
parsing of GenBank, EMBL and IMGT files.

See also:

International Nucleotide Sequence Database Collaboration
http://www.insdc.org/

GenBank
http://www.ncbi.nlm.nih.gov/Genbank/

EMBL Nucleotide Sequence Database
http://www.ebi.ac.uk/embl/

DDBJ (DNA Data Bank of Japan)
http://www.ddbj.nig.ac.jp/

IMGT (use a variant of EMBL format with longer feature indents)
http://imgt.cines.fr/download/LIGM-DB/userman_doc.html
http://imgt.cines.fr/download/LIGM-DB/ftable_doc.html
http://www.ebi.ac.uk/imgt/hla/docs/manual.html

Classes [hide private]
  _InsdcWriter
Base class for GenBank and EMBL writers (PRIVATE).
  GenBankWriter
  EmblWriter
  ImgtWriter
Functions [hide private]
 
GenBankIterator(handle)
Breaks up a Genbank file into SeqRecord objects.
source code
 
EmblIterator(handle)
Breaks up an EMBL file into SeqRecord objects.
source code
 
ImgtIterator(handle)
Breaks up an IMGT file into SeqRecord objects.
source code
 
GenBankCdsFeatureIterator(handle, alphabet=ProteinAlphabet())
Breaks up a Genbank file into SeqRecord objects for each CDS feature.
source code
 
EmblCdsFeatureIterator(handle, alphabet=ProteinAlphabet())
Breaks up a EMBL file into SeqRecord objects for each CDS feature.
source code
 
_insdc_feature_position_string(pos, offset=0)
Build a GenBank/EMBL position string (PRIVATE).
source code
 
_insdc_location_string_ignoring_strand_and_subfeatures(location, rec_length) source code
 
_insdc_location_string(location, rec_length)
Build a GenBank/EMBL location from a (Compound) FeatureLocation (PRIVATE).
source code
 
_insdc_feature_location_string(feature, rec_length)
Build a GenBank/EMBL location string from a SeqFeature (PRIVATE, OBSOLETE).
source code
Variables [hide private]
  __package__ = 'Bio.SeqIO'
Function Details [hide private]

GenBankIterator(handle)

source code 
Breaks up a Genbank file into SeqRecord objects.

Every section from the LOCUS line to the terminating // becomes
a single SeqRecord with associated annotation and features.

Note that for genomes or chromosomes, there is typically only
one record.

EmblIterator(handle)

source code 
Breaks up an EMBL file into SeqRecord objects.

Every section from the LOCUS line to the terminating // becomes
a single SeqRecord with associated annotation and features.

Note that for genomes or chromosomes, there is typically only
one record.

ImgtIterator(handle)

source code 
Breaks up an IMGT file into SeqRecord objects.

Every section from the LOCUS line to the terminating // becomes
a single SeqRecord with associated annotation and features.

Note that for genomes or chromosomes, there is typically only
one record.

GenBankCdsFeatureIterator(handle, alphabet=ProteinAlphabet())

source code 
Breaks up a Genbank file into SeqRecord objects for each CDS feature.

Every section from the LOCUS line to the terminating // can contain
many CDS features.  These are returned as with the stated amino acid
translation sequence (if given).

EmblCdsFeatureIterator(handle, alphabet=ProteinAlphabet())

source code 
Breaks up a EMBL file into SeqRecord objects for each CDS feature.

Every section from the LOCUS line to the terminating // can contain
many CDS features.  These are returned as with the stated amino acid
translation sequence (if given).

_insdc_feature_position_string(pos, offset=0)

source code 
Build a GenBank/EMBL position string (PRIVATE).

Use offset=1 to add one to convert a start position from python counting.

_insdc_location_string(location, rec_length)

source code 
Build a GenBank/EMBL location from a (Compound) FeatureLocation (PRIVATE).

There is a choice of how to show joins on the reverse complement strand,
GenBank used "complement(join(1,10),(20,100))" while EMBL used to use
"join(complement(20,100),complement(1,10))" instead (but appears to have
now adopted the GenBank convention). Notice that the order of the entries
is reversed! This function therefore uses the first form. In this situation
we expect the CompoundFeatureLocation and its parts to all be marked as
strand == -1, and to be in the order 19:100 then 0:10.

_insdc_feature_location_string(feature, rec_length)

source code 
Build a GenBank/EMBL location string from a SeqFeature (PRIVATE, OBSOLETE).

There is a choice of how to show joins on the reverse complement strand,
GenBank used "complement(join(1,10),(20,100))" while EMBL used to use
"join(complement(20,100),complement(1,10))" instead (but appears to have
now adopted the GenBank convention). Notice that the order of the entries
is reversed! This function therefore uses the first form. In this situation
we expect the parent feature and the two children to all be marked as
strand == -1, and in the order 0:10 then 19:100.

Also need to consider dual-strand examples like these from the Arabidopsis
thaliana chloroplast NC_000932: join(complement(69611..69724),139856..140650)
gene ArthCp047, GeneID:844801 or its CDS (protein NP_051038.1 GI:7525057)
which is further complicated by a splice:
join(complement(69611..69724),139856..140087,140625..140650)

For this mixed strand feature, the parent SeqFeature should have
no strand (either 0 or None) while the child features should have either
strand +1 or -1 as appropriate, and be listed in the order given here.