Bio.SeqIO.FastaIO module

Bio.SeqIO support for the “fasta” (aka FastA or Pearson) file format.

You are expected to use this module via the Bio.SeqIO functions.

Bio.SeqIO.FastaIO.SimpleFastaParser(handle)

Iterate over Fasta records as string tuples.

Arguments:

handle - input stream opened in text mode

For each record a tuple of two strings is returned, the FASTA title line (without the leading ‘>’ character), and the sequence (with any whitespace removed). The title line is not divided up into an identifier (the first word) and comment or description.

>>> with open("Fasta/dups.fasta") as handle:
...     for values in SimpleFastaParser(handle):
...         print(values)
...
('alpha', 'ACGTA')
('beta', 'CGTC')
('gamma', 'CCGCC')
('alpha (again - this is a duplicate entry to test the indexing code)', 'ACGTA')
('delta', 'CGCGC')

Bio.SeqIO.FastaIO.FastaTwoLineParser(handle)

Iterate over no-wrapping Fasta records as string tuples.

Arguments:

handle - input stream opened in text mode

Functionally the same as SimpleFastaParser but with a strict interpretation of the FASTA format as exactly two lines per record, the greater-than-sign identifier with description, and the sequence with no line wrapping.

Any line wrapping will raise an exception, as will excess blank lines (other than the special case of a zero-length sequence as the second line of a record).

Examples

This file uses two lines per FASTA record:

>>> with open("Fasta/aster_no_wrap.pro") as handle:
...     for title, seq in FastaTwoLineParser(handle):
...         print("%s = %s..." % (title, seq[:3]))
...
gi|3298468|dbj|BAA31520.1| SAMIPF = GGH...

This equivalent file uses line wrapping:

>>> with open("Fasta/aster.pro") as handle:
...     for title, seq in FastaTwoLineParser(handle):
...         print("%s = %s..." % (title, seq[:3]))
...
Traceback (most recent call last):
   ...
ValueError: Expected FASTA record starting with '>' character. Perhaps this file is using FASTA line wrapping? Got: 'MTFGLVYTVYATAIDPKKGSLGTIAPIAIGFIVGANI'

class Bio.SeqIO.FastaIO.FastaIterator(source: IO[str] | PathLike | str | bytes, alphabet: None = None)

Bases: SequenceIterator

Parser for plain Fasta files without comments.

modes = 't'

__init__(source: IO[str] | PathLike | str | bytes, alphabet: None = None) → None

Iterate over Fasta records as SeqRecord objects.

Arguments:

source - input stream opened in text mode, or a path to a file
alphabet - optional alphabet, not used. Leave as None.

This parser expects a plain Fasta format without comments or header lines.

By default this will act like calling Bio.SeqIO.parse(handle, “fasta”) with no custom handling of the title lines:

>>> with open("Fasta/dups.fasta") as handle:
...     for record in FastaIterator(handle):
...         print(record.id)
...
alpha
beta
gamma
alpha
delta

If you want to modify the records before writing, for example to change the ID of each record, you can use a generator function as follows:

>>> def modify_records(records):
...     for record in records:
...         record.id = record.id.upper()
...         yield record
...
>>> with open('Fasta/dups.fasta') as handle:
...     for record in modify_records(FastaIterator(handle)):
...         print(record.id)
...
ALPHA
BETA
GAMMA
ALPHA
DELTA

__next__()

Return the next SeqRecord.

This method must be implemented by the subclass.

__abstractmethods__ = frozenset({})

__annotations__ = {}

__parameters__ = ()

class Bio.SeqIO.FastaIO.FastaTwoLineIterator(source)

Bases: SequenceIterator

Parser for Fasta files with exactly two lines per record.

modes = 't'

__init__(source)

Iterate over two-line Fasta records (as SeqRecord objects).

Arguments:

source - input stream opened in text mode, or a path to a file

This uses a strict interpretation of the FASTA as requiring exactly two lines per record (no line wrapping).

Only the default title to ID/name/description parsing offered by the relaxed FASTA parser is offered.

__next__()

Return the next SeqRecord.

This method must be implemented by the subclass.

__abstractmethods__ = frozenset({})

__annotations__ = {}

__parameters__ = ()

class Bio.SeqIO.FastaIO.FastaBlastIterator(source: IO[str] | PathLike | str | bytes, alphabet: None = None)

Bases: SequenceIterator

Parser for Fasta files, allowing for comments as in BLAST.

modes = 't'

__init__(source: IO[str] | PathLike | str | bytes, alphabet: None = None) → None

Iterate over Fasta records as SeqRecord objects.

Arguments:

source - input stream opened in text mode, or a path to a file
alphabet - optional alphabet, not used. Leave as None.

This parser expects the data to be in FASTA format. As in BLAST, lines starting with ‘#’, ‘!’, or ‘;’ are interpreted as comments and ignored.

This iterator acts like calling Bio.SeqIO.parse(handle, “fasta-blast”) with no custom handling of the title lines:

>>> with open("Fasta/dups.fasta") as handle:
...     for record in FastaIterator(handle):
...         print(record.id)
...
alpha
beta
gamma
alpha
delta

If you want to modify the records before writing, for example to change the ID of each record, you can use a generator function as follows:

>>> def modify_records(records):
...     for record in records:
...         record.id = record.id.upper()
...         yield record
...
>>> with open('Fasta/dups.fasta') as handle:
...     for record in modify_records(FastaIterator(handle)):
...         print(record.id)
...
ALPHA
BETA
GAMMA
ALPHA
DELTA

__next__()

Return the next SeqRecord.

This method must be implemented by the subclass.

__abstractmethods__ = frozenset({})

__annotations__ = {}

__parameters__ = ()

class Bio.SeqIO.FastaIO.FastaPearsonIterator(source: IO[str] | PathLike | str | bytes, alphabet: None = None)

Bases: SequenceIterator

Parser for Fasta files, allowing for comments as in the FASTA aligner.

modes = 't'

__init__(source: IO[str] | PathLike | str | bytes, alphabet: None = None) → None

Iterate over Fasta records as SeqRecord objects.

Arguments:

source - input stream opened in text mode, or a path to a file
alphabet - optional alphabet, not used. Leave as None.

This parser expects a Fasta format allowing for a header (before the first sequence record) and comments (lines starting with ‘;’) as in William Pearson’s FASTA aligner software.

This iterator acts as calling Bio.SeqIO.parse(handle, “fasta-pearson”) with no custom handling of the title lines:

>>> with open("Fasta/dups.fasta") as handle:
...     for record in FastaIterator(handle):
...         print(record.id)
...
alpha
beta
gamma
alpha
delta

If you want to modify the records before writing, for example to change the ID of each record, you can use a generator function as follows:

>>> def modify_records(records):
...     for record in records:
...         record.id = record.id.upper()
...         yield record
...
>>> with open('Fasta/dups.fasta') as handle:
...     for record in modify_records(FastaIterator(handle)):
...         print(record.id)
...
ALPHA
BETA
GAMMA
ALPHA
DELTA

__next__()

Return the next SeqRecord.

This method must be implemented by the subclass.

__abstractmethods__ = frozenset({})

__annotations__ = {}

__parameters__ = ()

class Bio.SeqIO.FastaIO.FastaWriter(target, wrap=60, record2title=None)

Bases: SequenceWriter

FASTA file writer.

modes = 't'

__init__(target, wrap=60, record2title=None)

Create a Fasta writer.

Arguments:

target - Output stream opened in text mode, or a path to a file.
wrap - Optional line length used to wrap sequence lines. Defaults to wrapping the sequence at 60 characters Use zero (or None) for no wrapping, giving a single long line for the sequence.
record2title - Optional function to return the text to be used for the title line of each record. By default a combination of the record.id and record.description is used. If the record.description starts with the record.id, then just the record.description is used.

You can either use:

handle = open(filename, "w")
writer = FastaWriter(handle)
writer.write_file(myRecords)
handle.close()

Or, follow the sequential file writer system, for example:

handle = open(filename, "w")
writer = FastaWriter(handle)
...
Multiple writer.write_record() and/or writer.write_records() calls
...
handle.close()

classmethod to_string(record): Turn a SeqRecord into a FASTA formatted string, and return it.

write_record(record): Write a single Fasta record to the file.

__abstractmethods__ = frozenset({})

__annotations__ = {}

__parameters__ = ()

class Bio.SeqIO.FastaIO.FastaTwoLineWriter(handle, record2title=None)

Bases: FastaWriter

Class to write 2-line per record Fasta format files.

This means we write the sequence information without line wrapping, and will always write a blank line for an empty sequence.

__init__(handle, record2title=None)

Create a 2-line per record Fasta writer.

Arguments:

handle - Handle to an output file, e.g. as returned by open(filename, “w”)
record2title - Optional function to return the text to be used for the title line of each record. By default a combination of the record.id and record.description is used. If the record.description starts with the record.id, then just the record.description is used.

You can either use:

handle = open(filename, "w")
writer = FastaWriter(handle)
writer.write_file(myRecords)
handle.close()

Or, follow the sequential file writer system, for example:

handle = open(filename, "w")
writer = FastaWriter(handle)
...
Multiple writer.write_record() and/or writer.write_records() calls
...
handle.close()

classmethod to_string(record): Return a string in FASTA format with the sequence as one line.

__abstractmethods__ = frozenset({})

__annotations__ = {}

__parameters__ = ()

Bio.SeqIO.FastaIO.as_fasta(record): Turn a SeqRecord into a FASTA formatted string.

Bio.SeqIO.FastaIO.as_fasta_2line(record): Turn a SeqRecord into a two-line FASTA formatted string.