Bio.SearchIO.FastaIO module
Bio.SearchIO support for Bill Pearson’s FASTA tools.
This module adds support for parsing FASTA outputs. FASTA is a suite of programs that finds regions of local or global similarity between protein or nucleotide sequences, either by searching databases or identifying local duplications.
Bio.SearchIO.FastaIO was tested on the following FASTA flavors and versions:
flavors: fasta, ssearch, tfastx
versions: 35, 36
Other flavors and/or versions may introduce some bugs. Please file a bug report if you see such problems to Biopython’s bug tracker.
More information on FASTA are available through these links:
Supported Formats
Bio.SearchIO.FastaIO supports parsing and indexing FASTA outputs triggered by the -m 10 flag. Other formats that mimic other programs (e.g. the BLAST tabular format using the -m 8 flag) may be parseable but using SearchIO’s other parsers (in this case, using the ‘blast-tab’ parser).
fasta-m10
Note that in FASTA -m 10 outputs, HSPs from different strands are considered to be from different hits. They are listed as two separate entries in the hit table. FastaIO recognizes this and will group HSPs with the same hit ID into a single Hit object, regardless of strand.
FASTA also sometimes output extra sequences adjacent to the HSP match. These extra sequences are discarded by FastaIO. Only regions containing the actual sequence match are extracted.
The following object attributes are provided:
Object |
Attribute |
Value |
---|---|---|
QueryResult |
description |
query sequence description |
id |
query sequence ID |
|
program |
FASTA flavor |
|
seq_len |
full length of query sequence |
|
target |
target search database |
|
version |
FASTA version |
|
Hit |
seq_len |
full length of the hit sequence |
HSP |
bitscore |
*_bits line |
evalue |
*_expect line |
|
ident_pct |
*_ident line |
|
init1_score |
*_init1 line |
|
initn_score |
*_initn line |
|
opt_score |
*_opt line, *_s-w opt line |
|
pos_pct |
*_sim line |
|
sw_score |
*_score line |
|
z_score |
*_z-score line |
|
HSPFragment (also via HSP) |
aln_annotation |
al_cons block, if present |
hit |
hit sequence |
|
hit_end |
hit sequence end coordinate |
|
hit_start |
hit sequence start coordinate |
|
hit_strand |
hit sequence strand |
|
query |
query sequence |
|
query_end |
query sequence end coordinate |
|
query_start |
query sequence start coordinate |
|
query_strand |
query sequence strand |
- class Bio.SearchIO.FastaIO.FastaM10Parser(handle, _FastaM10Parser__parse_hit_table=False)
Bases:
object
Parser for Bill Pearson’s FASTA suite’s -m 10 output.
- __init__(handle, _FastaM10Parser__parse_hit_table=False)
Initialize the class.
- __iter__()
Iterate over FastaM10Parser object yields query results.
- class Bio.SearchIO.FastaIO.FastaM10Indexer(filename)
Bases:
SearchIndexer
Indexer class for Bill Pearson’s FASTA suite’s -m 10 output.
- __init__(filename)
Initialize the class.
- __iter__()
Iterate over FastaM10Indexer; yields query results’ keys, start offsets, offset lengths.
- get_raw(offset)
Return the raw record from the file as a bytes string.
- __abstractmethods__ = frozenset({})