Bio.SeqIO.InsdcIO module¶
Bio.SeqIO support for the “genbank” and “embl” file formats.
You are expected to use this module via the Bio.SeqIO functions. Note that internally this module calls Bio.GenBank to do the actual parsing of GenBank, EMBL and IMGT files.
See Also: International Nucleotide Sequence Database Collaboration http://www.insdc.org/
GenBank http://www.ncbi.nlm.nih.gov/Genbank/
EMBL Nucleotide Sequence Database http://www.ebi.ac.uk/embl/
DDBJ (DNA Data Bank of Japan) http://www.ddbj.nig.ac.jp/
IMGT (use a variant of EMBL format with longer feature indents) http://imgt.cines.fr/download/LIGM-DB/userman_doc.html http://imgt.cines.fr/download/LIGM-DB/ftable_doc.html http://www.ebi.ac.uk/imgt/hla/docs/manual.html
-
Bio.SeqIO.InsdcIO.
GenBankIterator
(handle)¶ Break up a Genbank file into SeqRecord objects.
Every section from the LOCUS line to the terminating // becomes a single SeqRecord with associated annotation and features.
Note that for genomes or chromosomes, there is typically only one record.
This gets called internally by Bio.SeqIO for the GenBank file format:
>>> from Bio import SeqIO >>> for record in SeqIO.parse("GenBank/cor6_6.gb", "gb"): ... print(record.id) ... X55053.1 X62281.1 M81224.1 AJ237582.1 L31939.1 AF297471.1
Equivalently,
>>> with open("GenBank/cor6_6.gb") as handle: ... for record in GenBankIterator(handle): ... print(record.id) ... X55053.1 X62281.1 M81224.1 AJ237582.1 L31939.1 AF297471.1
-
Bio.SeqIO.InsdcIO.
EmblIterator
(handle)¶ Break up an EMBL file into SeqRecord objects.
Every section from the LOCUS line to the terminating // becomes a single SeqRecord with associated annotation and features.
Note that for genomes or chromosomes, there is typically only one record.
This gets called internally by Bio.SeqIO for the EMBL file format:
>>> from Bio import SeqIO >>> for record in SeqIO.parse("EMBL/epo_prt_selection.embl", "embl"): ... print(record.id) ... A00022.1 A00028.1 A00031.1 A00034.1 A00060.1 A00071.1 A00072.1 A00078.1 CQ797900.1
Equivalently,
>>> with open("EMBL/epo_prt_selection.embl") as handle: ... for record in EmblIterator(handle): ... print(record.id) ... A00022.1 A00028.1 A00031.1 A00034.1 A00060.1 A00071.1 A00072.1 A00078.1 CQ797900.1
-
Bio.SeqIO.InsdcIO.
ImgtIterator
(handle)¶ Break up an IMGT file into SeqRecord objects.
Every section from the LOCUS line to the terminating // becomes a single SeqRecord with associated annotation and features.
Note that for genomes or chromosomes, there is typically only one record.
-
Bio.SeqIO.InsdcIO.
GenBankCdsFeatureIterator
(handle, alphabet=ProteinAlphabet())¶ Break up a Genbank file into SeqRecord objects for each CDS feature.
Every section from the LOCUS line to the terminating // can contain many CDS features. These are returned as with the stated amino acid translation sequence (if given).
-
Bio.SeqIO.InsdcIO.
EmblCdsFeatureIterator
(handle, alphabet=ProteinAlphabet())¶ Break up a EMBL file into SeqRecord objects for each CDS feature.
Every section from the LOCUS line to the terminating // can contain many CDS features. These are returned as with the stated amino acid translation sequence (if given).
-
class
Bio.SeqIO.InsdcIO.
GenBankWriter
(handle)¶ Bases:
Bio.SeqIO.InsdcIO._InsdcWriter
GenBank writer.
-
HEADER_WIDTH
= 12¶
-
QUALIFIER_INDENT
= 21¶
-
STRUCTURED_COMMENT_START
= '-START##'¶
-
STRUCTURED_COMMENT_END
= '-END##'¶
-
STRUCTURED_COMMENT_DELIM
= ' :: '¶
-
LETTERS_PER_LINE
= 60¶
-
SEQUENCE_INDENT
= 9¶
-
write_record
(self, record)¶ Write a single record to the output file.
-
-
class
Bio.SeqIO.InsdcIO.
EmblWriter
(handle)¶ Bases:
Bio.SeqIO.InsdcIO._InsdcWriter
EMBL writer.
-
HEADER_WIDTH
= 5¶
-
QUALIFIER_INDENT
= 21¶
-
QUALIFIER_INDENT_STR
= 'FT '¶
-
QUALIFIER_INDENT_TMP
= 'FT %s '¶
-
FEATURE_HEADER
= 'FH Key Location/Qualifiers\nFH\n'¶
-
LETTERS_PER_BLOCK
= 10¶
-
BLOCKS_PER_LINE
= 6¶
-
LETTERS_PER_LINE
= 60¶
-
POSITION_PADDING
= 10¶
-
write_record
(self, record)¶ Write a single record to the output file.
-