Bio.SearchIO.InterproscanIO package
Submodules
Module contents
Bio.SearchIO support for InterProScan output formats.
This module adds support for parsing InterProScan XML output. The InterProScan is available as a command line program or on EMBL-EBI’s web page. Bio.SearchIO.InterproscanIO was tested on the following version:
versions: 5.26-65.0 (interproscan-model-2.1.xsd)
More information about InterProScan are available through these links: - Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998142/ - Web interface: https://www.ebi.ac.uk/interpro/search/sequence-search - Documentation: https://github.com/ebi-pf-team/interproscan/wiki
Supported format
Bio.SearchIO.InterproscanIO supports the following format:
XML - ‘interproscan-xml’ - parsing
interproscan-xml
The interproscan-xml parser follows the InterProScan XML described here: https://github.com/ebi-pf-team/interproscan/wiki/OutputFormats
Object |
Attribute |
XML Element |
---|---|---|
QueryResult |
target |
|
program |
|
|
version |
|
|
Hit |
accession |
|
id |
|
|
description |
|
|
dbxrefs |
|
|
attributes [‘Target’] [‘Target version’] [‘Hit type’] |
|
|
HSP |
bitscore |
|
evalue |
|
|
HSPFragment (also via HSP) |
query_start |
|
query_end |
|
|
hit_start |
|
|
hit_end |
|
|
query |
|
InterProScan XML files may contain a match with multiple locations or multiple matches to the same protein with a single location. In both cases, the match is uniquely stored as a HIT object and the locations as HSP objects.
HSP.*start == *start - 1
(Since every start position is 0-based in Biopython)
HSP.aln_span == query-end - query-start
The types of matches or locations (eg. hmmer3-match, hmmer3-location, coils-match, panther-location) are stored in hit.attributes[‘Hit type’]. For instance, for every ‘phobious-match’, there will be a ‘phobious-location’. Therefore, Hit.type will store the string excluding ‘-match’ or ‘-location’ (‘phobious’, in this example).