Bio.SeqIO.FastaIO module

Bio.SeqIO support for the “fasta” (aka FastA or Pearson) file format.

You are expected to use this module via the Bio.SeqIO functions.

Bio.SeqIO.FastaIO.SimpleFastaParser(handle)

Iterate over Fasta records as string tuples.

Arguments:

handle - input stream opened in text mode

For each record a tuple of two strings is returned, the FASTA title line (without the leading ‘>’ character), and the sequence (with any whitespace removed). The title line is not divided up into an identifier (the first word) and comment or description.

>>> with open("Fasta/dups.fasta") as handle:
...     for values in SimpleFastaParser(handle):
...         print(values)
...
('alpha', 'ACGTA')
('beta', 'CGTC')
('gamma', 'CCGCC')
('alpha (again - this is a duplicate entry to test the indexing code)', 'ACGTA')
('delta', 'CGCGC')

Bio.SeqIO.FastaIO.FastaTwoLineParser(handle)

Iterate over no-wrapping Fasta records as string tuples.

Arguments:

handle - input stream opened in text mode

Functionally the same as SimpleFastaParser but with a strict interpretation of the FASTA format as exactly two lines per record, the greater-than-sign identifier with description, and the sequence with no line wrapping.

Any line wrapping will raise an exception, as will excess blank lines (other than the special case of a zero-length sequence as the second line of a record).

Examples

This file uses two lines per FASTA record:

>>> with open("Fasta/aster_no_wrap.pro") as handle:
...     for title, seq in FastaTwoLineParser(handle):
...         print("%s = %s..." % (title, seq[:3]))
...
gi|3298468|dbj|BAA31520.1| SAMIPF = GGH...

This equivalent file uses line wrapping:

>>> with open("Fasta/aster.pro") as handle:
...     for title, seq in FastaTwoLineParser(handle):
...         print("%s = %s..." % (title, seq[:3]))
...
Traceback (most recent call last):
   ...
ValueError: Expected FASTA record starting with '>' character. Perhaps this file is using FASTA line wrapping? Got: 'MTFGLVYTVYATAIDPKKGSLGTIAPIAIGFIVGANI'

class Bio.SeqIO.FastaIO.FastaIterator(source: Union[IO[str], os.PathLike, str, bytes], alphabet: None = None)

Bases: Bio.SeqIO.Interfaces.SequenceIterator

Parser for Fasta files.

__init__(source: Union[IO[str], os.PathLike, str, bytes], alphabet: None = None) → None

Iterate over Fasta records as SeqRecord objects.

Arguments:

source - input stream opened in text mode, or a path to a file
alphabet - optional alphabet, not used. Leave as None.

By default this will act like calling Bio.SeqIO.parse(handle, “fasta”) with no custom handling of the title lines:

>>> with open("Fasta/dups.fasta") as handle:
...     for record in FastaIterator(handle):
...         print(record.id)
...
alpha
beta
gamma
alpha
delta

If you want to modify the records before writing, for example to change the ID of each record, you can use a generator function as follows:

>>> def modify_records(records):
...     for record in records:
...         record.id = record.id.upper()
...         yield record
...
>>> with open('Fasta/dups.fasta') as handle:
...     for record in modify_records(FastaIterator(handle)):
...         print(record.id)
...
ALPHA
BETA
GAMMA
ALPHA
DELTA

parse(handle): Start parsing the file, and return a SeqRecord generator.

iterate(handle): Parse the file and generate SeqRecord objects.

__abstractmethods__ = frozenset({})

__annotations__ = {}

__parameters__ = ()

class Bio.SeqIO.FastaIO.FastaTwoLineIterator(source)

Bases: Bio.SeqIO.Interfaces.SequenceIterator

Parser for Fasta files with exactly two lines per record.

__init__(source)

Iterate over two-line Fasta records (as SeqRecord objects).

Arguments:

source - input stream opened in text mode, or a path to a file

This uses a strict interpretation of the FASTA as requiring exactly two lines per record (no line wrapping).

Only the default title to ID/name/description parsing offered by the relaxed FASTA parser is offered.

parse(handle): Start parsing the file, and return a SeqRecord generator.

iterate(handle): Parse the file and generate SeqRecord objects.

__abstractmethods__ = frozenset({})

__annotations__ = {}

__parameters__ = ()

class Bio.SeqIO.FastaIO.FastaWriter(target, wrap=60, record2title=None)

Bases: Bio.SeqIO.Interfaces.SequenceWriter

Class to write Fasta format files (OBSOLETE).

Please use the as_fasta function instead, or the top level Bio.SeqIO.write() function instead using format="fasta".

__init__(target, wrap=60, record2title=None)

Create a Fasta writer (OBSOLETE).

Arguments:

target - Output stream opened in text mode, or a path to a file.
wrap - Optional line length used to wrap sequence lines. Defaults to wrapping the sequence at 60 characters Use zero (or None) for no wrapping, giving a single long line for the sequence.
record2title - Optional function to return the text to be used for the title line of each record. By default a combination of the record.id and record.description is used. If the record.description starts with the record.id, then just the record.description is used.

You can either use:

handle = open(filename, "w")
writer = FastaWriter(handle)
writer.write_file(myRecords)
handle.close()

Or, follow the sequential file writer system, for example:

handle = open(filename, "w")
writer = FastaWriter(handle)
writer.write_header() # does nothing for Fasta files
...
Multiple writer.write_record() and/or writer.write_records() calls
...
writer.write_footer() # does nothing for Fasta files
handle.close()

write_record(record): Write a single Fasta record to the file.

class Bio.SeqIO.FastaIO.FastaTwoLineWriter(handle, record2title=None)

Bases: Bio.SeqIO.FastaIO.FastaWriter

Class to write 2-line per record Fasta format files (OBSOLETE).

This means we write the sequence information without line wrapping, and will always write a blank line for an empty sequence.

Please use the as_fasta_2line function instead, or the top level Bio.SeqIO.write() function instead using format="fasta".

__init__(handle, record2title=None)

Create a 2-line per record Fasta writer (OBSOLETE).

Arguments:

handle - Handle to an output file, e.g. as returned by open(filename, “w”)
record2title - Optional function to return the text to be used for the title line of each record. By default a combination of the record.id and record.description is used. If the record.description starts with the record.id, then just the record.description is used.

You can either use:

handle = open(filename, "w")
writer = FastaWriter(handle)
writer.write_file(myRecords)
handle.close()

Or, follow the sequential file writer system, for example:

handle = open(filename, "w")
writer = FastaWriter(handle)
writer.write_header() # does nothing for Fasta files
...
Multiple writer.write_record() and/or writer.write_records() calls
...
writer.write_footer() # does nothing for Fasta files
handle.close()

__annotations__ = {}

Bio.SeqIO.FastaIO.as_fasta(record)

Turn a SeqRecord into a FASTA formatted string.

This is used internally by the SeqRecord’s .format(“fasta”) method and by the SeqIO.write(…, …, “fasta”) function.

Bio.SeqIO.FastaIO.as_fasta_2line(record)

Turn a SeqRecord into a two-line FASTA formatted string.

This is used internally by the SeqRecord’s .format(“fasta-2line”) method and by the SeqIO.write(…, …, “fasta-2line”) function.