Package Bio :: Package SeqIO :: Module FastaIO
[hide private]
[frames] | no frames]

Module FastaIO

source code

Bio.SeqIO support for the "fasta" (aka FastA or Pearson) file format.

You are expected to use this module via the Bio.SeqIO functions.

Classes [hide private]
  FastaWriter
Class to write Fasta format files (OBSOLETE).
  FastaTwoLineWriter
Class to write 2-line per record Fasta format files (OBSOLETE).
Functions [hide private]
 
SimpleFastaParser(handle)
Iterate over Fasta records as string tuples.
source code
 
FastaTwoLineParser(handle)
Iterate over no-wrapping Fasta records as string tuples.
source code
 
FastaIterator(handle, alphabet=SingleLetterAlphabet(), title2ids=None)
Iterate over Fasta records as SeqRecord objects.
source code
 
FastaTwoLineIterator(handle, alphabet=SingleLetterAlphabet())
Iterate over two-line Fasta records (as SeqRecord objects).
source code
 
as_fasta(record)
Turn a SeqRecord into a FASTA formated string.
source code
 
as_fasta_2line(record)
Turn a SeqRecord into a two-line FASTA formated string.
source code
Variables [hide private]
  __package__ = 'Bio.SeqIO'
Function Details [hide private]

SimpleFastaParser(handle)

source code 

Iterate over Fasta records as string tuples.

For each record a tuple of two strings is returned, the FASTA title line (without the leading '>' character), and the sequence (with any whitespace removed). The title line is not divided up into an identifier (the first word) and comment or description.

>>> with open("Fasta/dups.fasta") as handle:
...     for values in SimpleFastaParser(handle):
...         print(values)
...
('alpha', 'ACGTA')
('beta', 'CGTC')
('gamma', 'CCGCC')
('alpha (again - this is a duplicate entry to test the indexing code)', 'ACGTA')
('delta', 'CGCGC')

FastaTwoLineParser(handle)

source code 

Iterate over no-wrapping Fasta records as string tuples.

Functionally the same as SimpleFastaParser but with a strict interpretation of the FASTA format as exactly two lines per record, the greater-than-sign identifier with description, and the sequence with no line wrapping.

Any line wrapping will raise an exception, as will excess blank lines (other than the special case of a zero-length sequence as the second line of a record).

Examples

This file uses two lines per FASTA record:

>>> with open("Fasta/aster_no_wrap.pro") as handle:
...     for title, seq in FastaTwoLineParser(handle):
...         print("%s = %s..." % (title, seq[:3]))
...
gi|3298468|dbj|BAA31520.1| SAMIPF = GGH...

This equivalent file uses line wrapping:

>>> with open("Fasta/aster.pro") as handle:
...     for title, seq in FastaTwoLineParser(handle):
...         print("%s = %s..." % (title, seq[:3]))
...
Traceback (most recent call last):
   ...
ValueError: Expected FASTA record starting with '>' character. Perhaps this file is using FASTA line wrapping? Got: 'MTFGLVYTVYATAIDPKKGSLGTIAPIAIGFIVGANI'

FastaIterator(handle, alphabet=SingleLetterAlphabet(), title2ids=None)

source code 

Iterate over Fasta records as SeqRecord objects.

Arguments:
  • handle - input file
  • alphabet - optional alphabet
  • title2ids - A function that, when given the title of the FASTA file (without the beginning >), will return the id, name and description (in that order) for the record as a tuple of strings. If this is not given, then the entire title line will be used as the description, and the first word as the id and name.

By default this will act like calling Bio.SeqIO.parse(handle, "fasta") with no custom handling of the title lines:

>>> with open("Fasta/dups.fasta") as handle:
...     for record in FastaIterator(handle):
...         print(record.id)
...
alpha
beta
gamma
alpha
delta

However, you can supply a title2ids function to alter this:

>>> def take_upper(title):
...     return title.split(None, 1)[0].upper(), "", title
>>> with open("Fasta/dups.fasta") as handle:
...     for record in FastaIterator(handle, title2ids=take_upper):
...         print(record.id)
...
ALPHA
BETA
GAMMA
ALPHA
DELTA

FastaTwoLineIterator(handle, alphabet=SingleLetterAlphabet())

source code 

Iterate over two-line Fasta records (as SeqRecord objects).

Arguments:
  • handle - input file
  • alphabet - optional alphabet

This uses a strict interpretation of the FASTA as requiring exactly two lines per record (no line wrapping).

Only the default title to ID/name/description parsing offered by the relaxed FASTA parser is offered.

as_fasta(record)

source code 

Turn a SeqRecord into a FASTA formated string.

This is used internally by the SeqRecord's .format("fasta") method and by the SeqIO.write(..., ..., "fasta") function.

as_fasta_2line(record)

source code 

Turn a SeqRecord into a two-line FASTA formated string.

This is used internally by the SeqRecord's .format("fasta-2line") method and by the SeqIO.write(..., ..., "fasta-2line") function.