Bio.SearchIO.InfernalIO package

Submodules

Module contents

Bio.SearchIO support for Infernal output formats.

This module adds support for parsing Infernal outputs. Infernal is a suite of programs for searching DNA sequence databases for RNA structure and sequence similarities using covariance models (CMs).

Bio.SearchIO.InfernalIO was tested on the following Infernal versions and flavors:

  • Infernal (1.0.0+): cmscan and cmsearch

More information on HMMER are available through these links:

Supported formats

Bio.SearchIO.InfernalIO supports the following Infernal output formats:
  • Plain text - ‘infernal-text’ - parsing, indexing

  • Tabular - ‘infernal-tab’ - parsing, indexing

For all output formats, Infernal uses ‘mdl’ for ‘query’ and ‘seq’ for ‘hit’. InfernalIO is aware of this different naming scheme, and will use ‘query’ and ‘hit’ to fit SearchIO’s object model.

Infernal sometime reports ‘local ends’ (i.e., a large insertion or deletion in the optimal alignment), which are expresented by a number in brackets in the alignment (ex. AUUAC*[88]*GUAGU). In InfernalIO, these local alignment are split into fragments of the same HSP.

infernal-text

The Infernal plain text parser supports output files with alignment blocks (default) or without (with the ‘-noali’ flag). If the alignment blocks are present, it can parse files with variable alignment width (using the ‘-notextw’ or ‘-textw’ flag). Both CM or HMM searches (with the ‘–hmmonly’ flag) output are supported. The parser only supports non-verbose output formats.

The following SearchIO objects attributes are provided.

Object

Attribute

Value

QueryResult

accession

query accession

description

query sequence description

id

query sequence ID

program

Infernal flavor

seq_len

full length of query sequence

target

target search database

version

Infernal version

Hit

description

hit sequence description

id

hit sequence ID

HSP

evalue

hsp evalue

bias

hsp bias

bitscore

hsp score

gc

gc fraction

is_included

boolean, whether the hit of the hsp is in the inclusion threshold

query_start

query start position

query_end

query end position

query_endtype

query sequence end types (e.g., ‘[]’, ‘..’, ‘[.’, ‘.]’, etc.)

hit_start

hit start position

hit_end

hit end position

hit_endtype

hit sequence end types

acc_avg

expected accuracy per alignment residue (acc column)

model

type of model used (cm or hmm)

truncated

indicate if the hit is truncated (5’, 3’ or both) or not

HSPFragment

aln_annotation

alignment similarity string and other annotations (PP, CS, similarity and NC (except for –hmmonly))

aln_span

length of alignment fragment

hit

hit sequence

hit_start

local alignment sequence start coordinate (seq from)

hit_end

local alignment sequence end coordinate (seq to)

hit_strand

hit sequence strand

query

query sequence

query_start

local model alignment start coordinate (mdl from)

query_end

local model alignment end coordinate (mdl to)

infernal-tab

The Infernal plain text parser supports the standard cmsearch tabular output and cmscan tabular output files formats 1, 2 and 3 (inferred automatically from the header).

Rows marked with ‘*’ denotes attributes not available in the default format.

Object

Attribute

Value

QueryResult

accession

query accession

id

query sequence ID

clan*

Rfam clan

seq_len*

query sequence length

Hit

description

hit sequence description

id

hit sequence ID

accession

hit accession

seq_len*

hit sequence length

HSP

evalue

hsp evalue

bias

hsp bias

bitscore

hsp score

gc

gc fraction

is_included

boolean, whether the hit of the hsp is in the inclusion threshold

model

type of model used (cm or hmm)

truncated

indicate if the hit is truncated (5’, 3’ or both) or not

pipeline_pass

pipeline pass at which the hit was identified

olp*

overlap status of this hit (‘*’, ‘^’, ‘$’ or ‘=’)

anyidx*

index of the best scoring overlapping hit (or none if there are no overlap)

afrct1*

fraction of this hit that overlap with anyidx hit (or none if there are no overlap)

afrct2*

fraction of anyidx hit with this hit (or none if there are no overlap)

winidx*

index of the best scoring hit that overlaps with this hit that is marked as ‘^’ (or none if there are no overlap)

wfrct1*

fraction of this hit that overlap with winidx hit (or none if there are no overlap)

wfrct2*

fraction of winidx hit with this hit (or none if there are no overlap)

HSPFragment (also via HSP)

hit_start

local alignment sequence start coordinate (seq from)

hit_end

local alignment sequence end coordinate (seq to)

hit_strand

hit sequence strand

query_start

local model alignment start coordinate (mdl from)

query_end

local model alignment end coordinate (mdl to)