Package Bio :: Package SearchIO :: Package HmmerIO
[hide private]
[frames] | no frames]

Package HmmerIO

source code

Bio.SearchIO support for HMMER output formats.

This module adds support for parsing HMMER outputs. HMMER is a suite of programs implementing the profile hidden Markov models to find similarity across protein sequences.

Bio.SearchIO.HmmerIO was tested on the following HMMER versions and flavors:

More information on HMMER are available through these links:

Supported formats

Bio.SearchIO.HmmerIO supports the following HMMER output formats:

Note that for all output formats, HMMER uses its own convention of input and output coordinates. It does not use the term 'hit' or 'query', instead it uses 'hmm' or 'ali'. For example, 'hmmfrom' is the start coordinate of the HMM sequence while 'alifrom' is the start coordinate of the protein sequence.

HmmerIO is aware of this different naming scheme and will adjust them accordingly to fit SearchIO's object model. If HmmerIO sees that the output file to parse was written by hmmsearch or phmmer, all 'hmm' coordinates will be the hit coordinates and 'ali' coordinates will be the query coordinates. Conversely, if the HMMER flavor is hmmscan, 'hmm' will be query and 'ali' will be hit.

This is why the 'hmmer3-domtab' format has to be specified with the source HMMER flavor. The parsers need to know which is the hit and which is the query. 'hmmer3-text' has its source program information present in the file, while 'hmmer3-tab' does not output any coordinates. That's why both of these formats do not need direct flavor specification like 'hmmer3-domtab'.

Also note that when using the domain table format writers, it will use HMMER's naming convention ('hmm' and 'ali') so the files you write will be similar to files written by a real HMMER program.

hmmer2-text and hmmer3-text

The parser for HMMER 3.0 plain text output can parse output files with alignment blocks (default) or without (with the '--noali' flag). If the alignment blocks are present, you can also parse files with variable alignment width (using the '--notextw' or '--textw' flag).

The following SearchIO objects attributes are provided. Rows marked with '*' denotes attributes not available in the hmmer2-text format:

Object Attribute Value
QueryResult accession accession (if present)
description query sequence description
id query sequence ID
program HMMER flavor
seq_len* full length of query sequence
target target search database
version BLAST version
Hit bias* hit-level bias
bitscore hit-level score
description hit sequence description
domain_exp_num* expected number of domains in the hit (exp column)
domain_obs_num observed number of domains in the hit (N column)
evalue hit-level e-value
id hit sequence ID
is_included* boolean, whether the hit is in the inclusion threshold or not
HSP acc_avg* expected accuracy per alignment residue (acc column)
bias* hsp-level bias
bitscore hsp-level score
domain_index the domain index set by HMMER
env_end* end coordinate of the envelope
env_endtype* envelope end types (e.g. '[]', '..', '[.', etc.)
env_start* start coordinate of the envelope
evalue hsp-level independent e-value
evalue_cond* hsp-level conditional e-value
hit_endtype hit sequence end types
is_included* boolean, whether the hit of the hsp is in the inclusion threshold
query_endtype query sequence end types
HSPFragment (also via HSP) aln_annotation alignment similarity string and other annotations (e.g. PP, CS)
aln_span length of alignment fragment
hit hit sequence
hit_end hit sequence end coordinate, may be 'hmmto' or 'alito' depending on the HMMER flavor
hit_start hit sequence start coordinate, may be 'hmmfrom' or 'alifrom' depending on the HMMER flavor
hit_strand hit sequence strand
query query sequence
query_end query sequence end coordinate, may be 'hmmto' or 'alito' depending on the HMMER flavor
query_start query sequence start coordinate, may be 'hmmfrom' or 'alifrom' depending on the HMMER flavor
query_strand query sequence strand

hmmer3-tab

The following SearchIO objects attributes are provided:

Object Attribute Column / Value
QueryResult accession query accession (if present)
description query sequence description
id query name
Hit accession hit accession
bias hit-level bias
bitscore hit-level score
description hit sequence description
cluster_num clu column
domain_exp_num exp column
domain_included_num inc column
domain_obs_num dom column
domain_reported_num rep column
env_num env column
evalue hit-level evalue
id target name
overlap_num ov column
region_num reg column
HSP bias bias of the best domain
bitscore bitscore of the best domain
evalue evalue of the best domain

hmmer3-domtab

To parse domain table files, you must use the HMMER flavor that produced the file. So instead of using 'hmmer3-domtab', use either 'hmmsearch3-domtab', 'hmmscan3-domtab', or 'phmmer3-domtab'.

The following SearchIO objects attributes are provided:

Object Attribute Value
QueryResult accession accession
description query sequence description
id query sequence ID
seq_len full length of query sequence
Hit accession accession
bias hit-level bias
bitscore hit-level score
description hit sequence description
evalue hit-level e-value
id hit sequence ID
HSP acc_avg expected accuracy per alignment residue (acc column)
bias hsp-level bias
bitscore hsp-level score
domain_index the domain index set by HMMER
env_end end coordinate of the envelope
env_start start coordinate of the envelope
evalue hsp-level independent e-value
evalue_cond hsp-level conditional e-value
HSPFragment (also via HSP) hit_end hit sequence end coordinate, may be 'hmmto' or 'alito' depending on the HMMER flavor
hit_start hit sequence start coordinate, may be 'hmmfrom' or 'alifrom' depending on the HMMER flavor
hit_strand hit sequence strand
query_end query sequence end coordinate, may be 'hmmto' or 'alito' depending on the HMMER flavor
query_start query sequence start coordinate, may be 'hmmfrom' or 'alifrom' depending on the HMMER flavor
query_strand query sequence strand
Submodules [hide private]

Variables [hide private]
  __package__ = 'Bio.SearchIO.HmmerIO'