Bio.SeqIO.InsdcIO module

Bio.SeqIO support for the “genbank” and “embl” file formats.

You are expected to use this module via the Bio.SeqIO functions. Note that internally this module calls Bio.GenBank to do the actual parsing of GenBank, EMBL and IMGT files.

See Also: International Nucleotide Sequence Database Collaboration http://www.insdc.org/

GenBank http://www.ncbi.nlm.nih.gov/Genbank/

EMBL Nucleotide Sequence Database http://www.ebi.ac.uk/embl/

DDBJ (DNA Data Bank of Japan) http://www.ddbj.nig.ac.jp/

IMGT (use a variant of EMBL format with longer feature indents) http://imgt.cines.fr/download/LIGM-DB/userman_doc.html http://imgt.cines.fr/download/LIGM-DB/ftable_doc.html http://www.ebi.ac.uk/imgt/hla/docs/manual.html

class Bio.SeqIO.InsdcIO.GenBankIterator(source)

Bases: Bio.SeqIO.Interfaces.SequenceIterator

Parser for GenBank files.

__init__(source)

Break up a Genbank file into SeqRecord objects.

Argument source is a file-like object opened in text mode or a path to a file. Every section from the LOCUS line to the terminating // becomes a single SeqRecord with associated annotation and features.

Note that for genomes or chromosomes, there is typically only one record.

This gets called internally by Bio.SeqIO for the GenBank file format:

>>> from Bio import SeqIO
>>> for record in SeqIO.parse("GenBank/cor6_6.gb", "gb"):
...     print(record.id)
...
X55053.1
X62281.1
M81224.1
AJ237582.1
L31939.1
AF297471.1

Equivalently,

>>> with open("GenBank/cor6_6.gb") as handle:
...     for record in GenBankIterator(handle):
...         print(record.id)
...
X55053.1
X62281.1
M81224.1
AJ237582.1
L31939.1
AF297471.1
parse(handle)

Start parsing the file, and return a SeqRecord generator.

__abstractmethods__ = frozenset({})
class Bio.SeqIO.InsdcIO.EmblIterator(source)

Bases: Bio.SeqIO.Interfaces.SequenceIterator

Parser for EMBL files.

__init__(source)

Break up an EMBL file into SeqRecord objects.

Argument source is a file-like object opened in text mode or a path to a file. Every section from the LOCUS line to the terminating // becomes a single SeqRecord with associated annotation and features.

Note that for genomes or chromosomes, there is typically only one record.

This gets called internally by Bio.SeqIO for the EMBL file format:

>>> from Bio import SeqIO
>>> for record in SeqIO.parse("EMBL/epo_prt_selection.embl", "embl"):
...     print(record.id)
...
A00022.1
A00028.1
A00031.1
A00034.1
A00060.1
A00071.1
A00072.1
A00078.1
CQ797900.1

Equivalently,

>>> with open("EMBL/epo_prt_selection.embl") as handle:
...     for record in EmblIterator(handle):
...         print(record.id)
...
A00022.1
A00028.1
A00031.1
A00034.1
A00060.1
A00071.1
A00072.1
A00078.1
CQ797900.1
parse(handle)

Start parsing the file, and return a SeqRecord generator.

__abstractmethods__ = frozenset({})
class Bio.SeqIO.InsdcIO.ImgtIterator(source)

Bases: Bio.SeqIO.Interfaces.SequenceIterator

Parser for IMGT files.

__init__(source)

Break up an IMGT file into SeqRecord objects.

Argument source is a file-like object opened in text mode or a path to a file. Every section from the LOCUS line to the terminating // becomes a single SeqRecord with associated annotation and features.

Note that for genomes or chromosomes, there is typically only one record.

parse(handle)

Start parsing the file, and return a SeqRecord generator.

__abstractmethods__ = frozenset({})
class Bio.SeqIO.InsdcIO.GenBankCdsFeatureIterator(source)

Bases: Bio.SeqIO.Interfaces.SequenceIterator

Parser for GenBank files, creating a SeqRecord for each CDS feature.

__init__(source)

Break up a Genbank file into SeqRecord objects for each CDS feature.

Argument source is a file-like object opened in text mode or a path to a file.

Every section from the LOCUS line to the terminating // can contain many CDS features. These are returned as with the stated amino acid translation sequence (if given).

parse(handle)

Start parsing the file, and return a SeqRecord generator.

__abstractmethods__ = frozenset({})
class Bio.SeqIO.InsdcIO.EmblCdsFeatureIterator(source)

Bases: Bio.SeqIO.Interfaces.SequenceIterator

Parser for EMBL files, creating a SeqRecord for each CDS feature.

__init__(source)

Break up a EMBL file into SeqRecord objects for each CDS feature.

Argument source is a file-like object opened in text mode or a path to a file.

Every section from the LOCUS line to the terminating // can contain many CDS features. These are returned as with the stated amino acid translation sequence (if given).

parse(handle)

Start parsing the file, and return a SeqRecord generator.

__abstractmethods__ = frozenset({})
class Bio.SeqIO.InsdcIO.GenBankWriter(target, mode='w')

Bases: Bio.SeqIO.InsdcIO._InsdcWriter

GenBank writer.

HEADER_WIDTH = 12
QUALIFIER_INDENT = 21
STRUCTURED_COMMENT_START = '-START##'
STRUCTURED_COMMENT_END = '-END##'
STRUCTURED_COMMENT_DELIM = ' :: '
LETTERS_PER_LINE = 60
SEQUENCE_INDENT = 9
write_record(record)

Write a single record to the output file.

class Bio.SeqIO.InsdcIO.EmblWriter(target, mode='w')

Bases: Bio.SeqIO.InsdcIO._InsdcWriter

EMBL writer.

HEADER_WIDTH = 5
QUALIFIER_INDENT = 21
QUALIFIER_INDENT_STR = 'FT                   '
QUALIFIER_INDENT_TMP = 'FT   %s                '
FEATURE_HEADER = 'FH   Key             Location/Qualifiers\nFH\n'
LETTERS_PER_BLOCK = 10
BLOCKS_PER_LINE = 6
LETTERS_PER_LINE = 60
POSITION_PADDING = 10
write_record(record)

Write a single record to the output file.

class Bio.SeqIO.InsdcIO.ImgtWriter(target, mode='w')

Bases: Bio.SeqIO.InsdcIO.EmblWriter

IMGT writer (EMBL format variant).

HEADER_WIDTH = 5
QUALIFIER_INDENT = 25
QUALIFIER_INDENT_STR = 'FT                       '
QUALIFIER_INDENT_TMP = 'FT   %s                    '
FEATURE_HEADER = 'FH   Key                 Location/Qualifiers\nFH\n'