Bio.SearchIO.InterproscanIO package

Submodules

Module contents

Bio.SearchIO support for InterProScan output formats.

This module adds support for parsing InterProScan XML output. The InterProScan is available as a command line program or on EMBL-EBI’s web page. Bio.SearchIO.InterproscanIO was tested on the following version:

  • versions: 5.26-65.0 (interproscan-model-2.1.xsd)

More information about InterProScan are available through these links: - Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998142/ - Web interface: https://www.ebi.ac.uk/interpro/search/sequence-search - Documentation: https://github.com/ebi-pf-team/interproscan/wiki

Supported format

Bio.SearchIO.InterproscanIO supports the following format:

  • XML - ‘interproscan-xml’ - parsing

interproscan-xml

The interproscan-xml parser follows the InterProScan XML described here: https://github.com/ebi-pf-team/interproscan/wiki/OutputFormats

Object

Attribute

XML Element

QueryResult

target

InterPro

program

InterProScan

version

protein-matches.interproscan-version

Hit

accession

signature.name

id

signature.ac

description

signature.desc

dbxrefs

IPR:entry.ac go-xref.id pathway-xref.db:pathway-xref.id

attributes [‘Target’] [‘Target version’] [‘Hit type’]

*-match / *-location signature-library-release.library signature-library-release.version

HSP

bitscore

*-location.score

evalue

*-location.evalue

HSPFragment (also via HSP)

query_start

*-location.start

query_end

*-location.end

hit_start

*-location.hmm-start

hit_end

*-location.hmm-end

query

sequence

InterProScan XML files may contain a match with multiple locations or multiple matches to the same protein with a single location. In both cases, the match is uniquely stored as a HIT object and the locations as HSP objects.

HSP.*start == *start - 1 (Since every start position is 0-based in Biopython)

HSP.aln_span ==  query-end - query-start

The types of matches or locations (eg. hmmer3-match, hmmer3-location, coils-match, panther-location) are stored in hit.attributes[‘Hit type’]. For instance, for every ‘phobious-match’, there will be a ‘phobious-location’. Therefore, Hit.type will store the string excluding ‘-match’ or ‘-location’ (‘phobious’, in this example).