Bio.SeqIO.FastaIO module
Bio.SeqIO support for the “fasta” (aka FastA or Pearson) file format.
You are expected to use this module via the Bio.SeqIO functions.
- Bio.SeqIO.FastaIO.SimpleFastaParser(handle)
Iterate over Fasta records as string tuples.
- Arguments:
handle - input stream opened in text mode
For each record a tuple of two strings is returned, the FASTA title line (without the leading ‘>’ character), and the sequence (with any whitespace removed). The title line is not divided up into an identifier (the first word) and comment or description.
>>> with open("Fasta/dups.fasta") as handle: ... for values in SimpleFastaParser(handle): ... print(values) ... ('alpha', 'ACGTA') ('beta', 'CGTC') ('gamma', 'CCGCC') ('alpha (again - this is a duplicate entry to test the indexing code)', 'ACGTA') ('delta', 'CGCGC')
- Bio.SeqIO.FastaIO.FastaTwoLineParser(handle)
Iterate over no-wrapping Fasta records as string tuples.
- Arguments:
handle - input stream opened in text mode
Functionally the same as SimpleFastaParser but with a strict interpretation of the FASTA format as exactly two lines per record, the greater-than-sign identifier with description, and the sequence with no line wrapping.
Any line wrapping will raise an exception, as will excess blank lines (other than the special case of a zero-length sequence as the second line of a record).
Examples
This file uses two lines per FASTA record:
>>> with open("Fasta/aster_no_wrap.pro") as handle: ... for title, seq in FastaTwoLineParser(handle): ... print("%s = %s..." % (title, seq[:3])) ... gi|3298468|dbj|BAA31520.1| SAMIPF = GGH...
This equivalent file uses line wrapping:
>>> with open("Fasta/aster.pro") as handle: ... for title, seq in FastaTwoLineParser(handle): ... print("%s = %s..." % (title, seq[:3])) ... Traceback (most recent call last): ... ValueError: Expected FASTA record starting with '>' character. Perhaps this file is using FASTA line wrapping? Got: 'MTFGLVYTVYATAIDPKKGSLGTIAPIAIGFIVGANI'
- class Bio.SeqIO.FastaIO.FastaIterator(source: IO[str] | PathLike | str | bytes, alphabet: None = None)
Bases:
SequenceIterator
Parser for plain Fasta files without comments.
- modes = 't'
- __init__(source: IO[str] | PathLike | str | bytes, alphabet: None = None) None
Iterate over Fasta records as SeqRecord objects.
- Arguments:
source - input stream opened in text mode, or a path to a file
alphabet - optional alphabet, not used. Leave as None.
This parser expects a plain Fasta format without comments or header lines.
By default this will act like calling Bio.SeqIO.parse(handle, “fasta”) with no custom handling of the title lines:
>>> with open("Fasta/dups.fasta") as handle: ... for record in FastaIterator(handle): ... print(record.id) ... alpha beta gamma alpha delta
If you want to modify the records before writing, for example to change the ID of each record, you can use a generator function as follows:
>>> def modify_records(records): ... for record in records: ... record.id = record.id.upper() ... yield record ... >>> with open('Fasta/dups.fasta') as handle: ... for record in modify_records(FastaIterator(handle)): ... print(record.id) ... ALPHA BETA GAMMA ALPHA DELTA
- __next__()
Return the next SeqRecord.
This method must be implemented by the subclass.
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __parameters__ = ()
- class Bio.SeqIO.FastaIO.FastaTwoLineIterator(source)
Bases:
SequenceIterator
Parser for Fasta files with exactly two lines per record.
- modes = 't'
- __init__(source)
Iterate over two-line Fasta records (as SeqRecord objects).
- Arguments:
source - input stream opened in text mode, or a path to a file
This uses a strict interpretation of the FASTA as requiring exactly two lines per record (no line wrapping).
Only the default title to ID/name/description parsing offered by the relaxed FASTA parser is offered.
- __next__()
Return the next SeqRecord.
This method must be implemented by the subclass.
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __parameters__ = ()
- class Bio.SeqIO.FastaIO.FastaBlastIterator(source: IO[str] | PathLike | str | bytes, alphabet: None = None)
Bases:
SequenceIterator
Parser for Fasta files, allowing for comments as in BLAST.
- modes = 't'
- __init__(source: IO[str] | PathLike | str | bytes, alphabet: None = None) None
Iterate over Fasta records as SeqRecord objects.
- Arguments:
source - input stream opened in text mode, or a path to a file
alphabet - optional alphabet, not used. Leave as None.
This parser expects the data to be in FASTA format. As in BLAST, lines starting with ‘#’, ‘!’, or ‘;’ are interpreted as comments and ignored.
This iterator acts like calling Bio.SeqIO.parse(handle, “fasta-blast”) with no custom handling of the title lines:
>>> with open("Fasta/dups.fasta") as handle: ... for record in FastaIterator(handle): ... print(record.id) ... alpha beta gamma alpha delta
If you want to modify the records before writing, for example to change the ID of each record, you can use a generator function as follows:
>>> def modify_records(records): ... for record in records: ... record.id = record.id.upper() ... yield record ... >>> with open('Fasta/dups.fasta') as handle: ... for record in modify_records(FastaIterator(handle)): ... print(record.id) ... ALPHA BETA GAMMA ALPHA DELTA
- __next__()
Return the next SeqRecord.
This method must be implemented by the subclass.
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __parameters__ = ()
- class Bio.SeqIO.FastaIO.FastaPearsonIterator(source: IO[str] | PathLike | str | bytes, alphabet: None = None)
Bases:
SequenceIterator
Parser for Fasta files, allowing for comments as in the FASTA aligner.
- modes = 't'
- __init__(source: IO[str] | PathLike | str | bytes, alphabet: None = None) None
Iterate over Fasta records as SeqRecord objects.
- Arguments:
source - input stream opened in text mode, or a path to a file
alphabet - optional alphabet, not used. Leave as None.
This parser expects a Fasta format allowing for a header (before the first sequence record) and comments (lines starting with ‘;’) as in William Pearson’s FASTA aligner software.
This iterator acts as calling Bio.SeqIO.parse(handle, “fasta-pearson”) with no custom handling of the title lines:
>>> with open("Fasta/dups.fasta") as handle: ... for record in FastaIterator(handle): ... print(record.id) ... alpha beta gamma alpha delta
If you want to modify the records before writing, for example to change the ID of each record, you can use a generator function as follows:
>>> def modify_records(records): ... for record in records: ... record.id = record.id.upper() ... yield record ... >>> with open('Fasta/dups.fasta') as handle: ... for record in modify_records(FastaIterator(handle)): ... print(record.id) ... ALPHA BETA GAMMA ALPHA DELTA
- __next__()
Return the next SeqRecord.
This method must be implemented by the subclass.
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __parameters__ = ()
- class Bio.SeqIO.FastaIO.FastaWriter(target, wrap=60, record2title=None)
Bases:
SequenceWriter
Class to write Fasta format files (OBSOLETE).
Please use the
as_fasta
function instead, or the top levelBio.SeqIO.write()
function instead usingformat="fasta"
.- modes = 't'
- __init__(target, wrap=60, record2title=None)
Create a Fasta writer (OBSOLETE).
- Arguments:
target - Output stream opened in text mode, or a path to a file.
wrap - Optional line length used to wrap sequence lines. Defaults to wrapping the sequence at 60 characters Use zero (or None) for no wrapping, giving a single long line for the sequence.
record2title - Optional function to return the text to be used for the title line of each record. By default a combination of the record.id and record.description is used. If the record.description starts with the record.id, then just the record.description is used.
You can either use:
handle = open(filename, "w") writer = FastaWriter(handle) writer.write_file(myRecords) handle.close()
Or, follow the sequential file writer system, for example:
handle = open(filename, "w") writer = FastaWriter(handle) ... Multiple writer.write_record() and/or writer.write_records() calls ... handle.close()
- write_record(record)
Write a single Fasta record to the file.
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __parameters__ = ()
- class Bio.SeqIO.FastaIO.FastaTwoLineWriter(handle, record2title=None)
Bases:
FastaWriter
Class to write 2-line per record Fasta format files (OBSOLETE).
This means we write the sequence information without line wrapping, and will always write a blank line for an empty sequence.
Please use the
as_fasta_2line
function instead, or the top levelBio.SeqIO.write()
function instead usingformat="fasta"
.- __init__(handle, record2title=None)
Create a 2-line per record Fasta writer (OBSOLETE).
- Arguments:
handle - Handle to an output file, e.g. as returned by open(filename, “w”)
record2title - Optional function to return the text to be used for the title line of each record. By default a combination of the record.id and record.description is used. If the record.description starts with the record.id, then just the record.description is used.
You can either use:
handle = open(filename, "w") writer = FastaWriter(handle) writer.write_file(myRecords) handle.close()
Or, follow the sequential file writer system, for example:
handle = open(filename, "w") writer = FastaWriter(handle) ... Multiple writer.write_record() and/or writer.write_records() calls ... handle.close()
- __abstractmethods__ = frozenset({})
- __annotations__ = {}
- __parameters__ = ()
- Bio.SeqIO.FastaIO.as_fasta(record)
Turn a SeqRecord into a FASTA formatted string.
This is used internally by the SeqRecord’s .format(“fasta”) method and by the SeqIO.write(…, …, “fasta”) function.
- Bio.SeqIO.FastaIO.as_fasta_2line(record)
Turn a SeqRecord into a two-line FASTA formatted string.
This is used internally by the SeqRecord’s .format(“fasta-2line”) method and by the SeqIO.write(…, …, “fasta-2line”) function.