Package Bio :: Module SeqFeature :: Class SeqFeature
[hide private]
[frames] | no frames]

Class SeqFeature

source code

object --+
         |
        SeqFeature

Represent a Sequence Feature on an object.

Attributes:
Instance Methods [hide private]
 
__init__(self, location=None, type='', location_operator='', strand=None, id='<unknown id>', qualifiers=None, sub_features=None, ref=None, ref_db=None)
Initialize a SeqFeature on a Sequence.
source code
 
_get_strand(self)
Get function for the strand property (PRIVATE).
source code
 
_set_strand(self, value)
Set function for the strand property (PRIVATE).
source code
 
_get_ref(self)
Get function for the reference property (PRIVATE).
source code
 
_set_ref(self, value)
Set function for the reference property (PRIVATE).
source code
 
_get_ref_db(self)
Get function for the database reference property (PRIVATE).
source code
 
_set_ref_db(self, value)
Set function for the database reference property (PRIVATE).
source code
 
_get_location_operator(self)
Get function for the location operator property (PRIVATE).
source code
 
_set_location_operator(self, value)
Set function for the location operator property (PRIVATE).
source code
 
__repr__(self)
Represent the feature as a string for debugging.
source code
 
__str__(self)
Return the full feature as a python string.
source code
 
_shift(self, offset)
Return a copy of the feature with its location shifted (PRIVATE).
source code
 
_flip(self, length)
Return a copy of the feature with its location flipped (PRIVATE).
source code
 
extract(self, parent_sequence)
Extract the feature's sequence from supplied parent sequence.
source code
 
translate(self, parent_sequence, table='Standard', start_offset=None, stop_symbol='*', to_stop=False, cds=False, gap=None)
Get a translation of the feature's sequence.
source code
 
__bool__(self)
Boolean value of an instance of this class (True).
source code
 
__nonzero__(self)
Boolean value of an instance of this class (True).
source code
 
__len__(self)
Return the length of the region where the feature is located.
source code
 
__iter__(self)
Iterate over the parent positions within the feature.
source code
 
__contains__(self, value)
Check if an integer position is within the feature.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Properties [hide private]
  strand
Feature's strand
  ref
Feature location reference (e.g. accession).
  ref_db
Feature location reference's database.
  location_operator
Location operator for compound locations (e.g. join).

Inherited from object: __class__

Method Details [hide private]

__init__(self, location=None, type='', location_operator='', strand=None, id='<unknown id>', qualifiers=None, sub_features=None, ref=None, ref_db=None)
(Constructor)

source code 

Initialize a SeqFeature on a Sequence.

location can either be a FeatureLocation (with strand argument also given if required), or None.

e.g. With no strand, on the forward strand, and on the reverse strand:

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> f1 = SeqFeature(FeatureLocation(5, 10), type="domain")
>>> f1.strand == f1.location.strand == None
True
>>> f2 = SeqFeature(FeatureLocation(7, 110, strand=1), type="CDS")
>>> f2.strand == f2.location.strand == +1
True
>>> f3 = SeqFeature(FeatureLocation(9, 108, strand=-1), type="CDS")
>>> f3.strand == f3.location.strand == -1
True

An invalid strand will trigger an exception:

>>> f4 = SeqFeature(FeatureLocation(50, 60), strand=2)
Traceback (most recent call last):
   ...
ValueError: Strand should be +1, -1, 0 or None, not 2

Similarly if set via the FeatureLocation directly:

>>> loc4 = FeatureLocation(50, 60, strand=2)
Traceback (most recent call last):
   ...
ValueError: Strand should be +1, -1, 0 or None, not 2

For exact start/end positions, an integer can be used (as shown above) as shorthand for the ExactPosition object. For non-exact locations, the FeatureLocation must be specified via the appropriate position objects.

Note that the strand, ref and ref_db arguments to the SeqFeature are now obsolete and will be deprecated in a future release (which will give warning messages) and later removed. Set them via the location object instead.

Note that location_operator and sub_features arguments can no longer be used, instead do this via the CompoundLocation object.

Overrides: object.__init__

__repr__(self)
(Representation operator)

source code 
Represent the feature as a string for debugging.
Overrides: object.__repr__

__str__(self)
(Informal representation operator)

source code 
Return the full feature as a python string.
Overrides: object.__str__

_shift(self, offset)

source code 

Return a copy of the feature with its location shifted (PRIVATE).

The annotation qaulifiers are copied.

_flip(self, length)

source code 

Return a copy of the feature with its location flipped (PRIVATE).

The argument length gives the length of the parent sequence. For example a location 0..20 (+1 strand) with parent length 30 becomes after flipping 10..30 (-1 strand). Strandless (None) or unknown strand (0) remain like that - just their end points are changed.

The annotation qaulifiers are copied.

extract(self, parent_sequence)

source code 

Extract the feature's sequence from supplied parent sequence.

The parent_sequence can be a Seq like object or a string, and will generally return an object of the same type. The exception to this is a MutableSeq as the parent sequence will return a Seq object.

This should cope with complex locations including complements, joins and fuzzy positions. Even mixed strand features should work! This also covers features on protein sequences (e.g. domains), although here reverse strand features are not permitted.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> f = SeqFeature(FeatureLocation(8, 15), type="domain")
>>> f.extract(seq)
Seq('VALIVIC', ProteinAlphabet())

If the FeatureLocation is None, e.g. when parsing invalid locus locations in the GenBank parser, extract() will raise a ValueError.

>>> from Bio.Seq import Seq
>>> from Bio.SeqFeature import SeqFeature
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> f = SeqFeature(None, type="domain")
>>> f.extract(seq)
Traceback (most recent call last):
   ...
ValueError: The feature's .location is None. Check the sequence file for a valid location.

Note - currently only compound features of type "join" are supported.

translate(self, parent_sequence, table='Standard', start_offset=None, stop_symbol='*', to_stop=False, cds=False, gap=None)

source code 

Get a translation of the feature's sequence.

This method is intended for CDS or other features that code proteins and is a shortcut that will both extract the feature and translate it, taking into account the codon_start and transl_table qualifiers, if they are present. If they are not present the value of the arguments "table" and "start_offset" are used.

The arguments stop_symbol, to_stop, cds and gap have the same meaning as Seq.translate, refer to that documentation for further information.

Arguments:
  • parent_sequence - This method will translate DNA or RNA sequences, and those with a nucleotide or generic alphabet. Trying to translate a protein sequence raises an exception.
  • table - Which codon table to use if there is no transl_table qualifier for this feature. This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). This defaults to the "Standard" table.
  • start_offset - offset at which the first complete codon of a coding feature can be found, relative to the first base of that feature. Has a valid value of 0, 1 or 2. NOTE: this uses python's 0-based numbering whereas the codon_start qualifier in files from NCBI use 1-based numbering. Will override a codon_start qualifier
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_dna
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> seq = Seq("GGTTACACTTACCGATAATGTCTCTGATGA", generic_dna)
>>> f = SeqFeature(FeatureLocation(0, 30), type="CDS")
>>> f.qualifiers['transl_table'] = [11]
>>> f.translate(seq)
Seq('GYTYR*CL**', HasStopCodon(ExtendedIUPACProtein(), '*'))

Now use the start_offset argument to change the frame. Note this uses python 0-based numbering

>>> f.translate(seq, start_offset=1)
Seq('VTLTDNVSD', ExtendedIUPACProtein())

Alternatively use the codon_start qualifier to do the same thing. Note: this uses 1-based numbering, which is found in files from NCBI

>>> f.qualifiers['codon_start'] = [2]
>>> f.translate(seq)
Seq('VTLTDNVSD', ExtendedIUPACProtein())

__bool__(self)

source code 

Boolean value of an instance of this class (True).

This behaviour is for backwards compatibility, since until the __len__ method was added, a SeqFeature always evaluated as True.

Note that in comparison, Seq objects, strings, lists, etc, will all evaluate to False if they have length zero.

WARNING: The SeqFeature may in future evaluate to False when its length is zero (in order to better match normal python behaviour)!

__nonzero__(self)
(Boolean test operator)

source code 

Boolean value of an instance of this class (True).

This behaviour is for backwards compatibility, since until the __len__ method was added, a SeqFeature always evaluated as True.

Note that in comparison, Seq objects, strings, lists, etc, will all evaluate to False if they have length zero.

WARNING: The SeqFeature may in future evaluate to False when its length is zero (in order to better match normal python behaviour)!

__len__(self)
(Length operator)

source code 

Return the length of the region where the feature is located.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> f = SeqFeature(FeatureLocation(8, 15), type="domain")
>>> len(f)
7
>>> f.extract(seq)
Seq('VALIVIC', ProteinAlphabet())
>>> len(f.extract(seq))
7

This is a proxy for taking the length of the feature's location:

>>> len(f.location)
7

For simple features this is the same as the region spanned (end position minus start position using Pythonic counting). However, for a compound location (e.g. a CDS as the join of several exons) the gaps are not counted (e.g. introns). This ensures that len(f) matches len(f.extract(parent_seq)), and also makes sure things work properly with features wrapping the origin etc.

__iter__(self)

source code 

Iterate over the parent positions within the feature.

The iteration order is strand aware, and can be thought of as moving along the feature using the parent sequence coordinates:

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> f = SeqFeature(FeatureLocation(5, 10), type="domain", strand=-1)
>>> len(f)
5
>>> for i in f: print(i)
9
8
7
6
5
>>> list(f)
[9, 8, 7, 6, 5]

This is a proxy for iterating over the location,

>>> list(f.location)
[9, 8, 7, 6, 5]

__contains__(self, value)
(In operator)

source code 

Check if an integer position is within the feature.

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> f = SeqFeature(FeatureLocation(5, 10), type="domain", strand=-1)
>>> len(f)
5
>>> [i for i in range(15) if i in f]
[5, 6, 7, 8, 9]

For example, to see which features include a SNP position, you could use this:

>>> from Bio import SeqIO
>>> record = SeqIO.read("GenBank/NC_000932.gb", "gb")
>>> for f in record.features:
...     if 1750 in f:
...         print("%s %s" % (f.type, f.location))
source [0:154478](+)
gene [1716:4347](-)
tRNA join{[4310:4347](-), [1716:1751](-)}

Note that for a feature defined as a join of several subfeatures (e.g. the union of several exons) the gaps are not checked (e.g. introns). In this example, the tRNA location is defined in the GenBank file as complement(join(1717..1751,4311..4347)), so that position 1760 falls in the gap:

>>> for f in record.features:
...     if 1760 in f:
...         print("%s %s" % (f.type, f.location))
source [0:154478](+)
gene [1716:4347](-)

Note that additional care may be required with fuzzy locations, for example just before a BeforePosition:

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> from Bio.SeqFeature import BeforePosition
>>> f = SeqFeature(FeatureLocation(BeforePosition(3), 8), type="domain")
>>> len(f)
5
>>> [i for i in range(10) if i in f]
[3, 4, 5, 6, 7]

Note that is is a proxy for testing membership on the location.

>>> [i for i in range(10) if i in f.location]
[3, 4, 5, 6, 7]

Property Details [hide private]

strand

Feature's strand

This is a shortcut for feature.location.strand

Get Method:
_get_strand(self) - Get function for the strand property (PRIVATE).
Set Method:
_set_strand(self, value) - Set function for the strand property (PRIVATE).

ref

Feature location reference (e.g. accession).

This is a shortcut for feature.location.ref

Get Method:
_get_ref(self) - Get function for the reference property (PRIVATE).
Set Method:
_set_ref(self, value) - Set function for the reference property (PRIVATE).

ref_db

Feature location reference's database.

This is a shortcut for feature.location.ref_db

Get Method:
_get_ref_db(self) - Get function for the database reference property (PRIVATE).
Set Method:
_set_ref_db(self, value) - Set function for the database reference property (PRIVATE).

location_operator

Location operator for compound locations (e.g. join).
Get Method:
_get_location_operator(self) - Get function for the location operator property (PRIVATE).
Set Method:
_set_location_operator(self, value) - Set function for the location operator property (PRIVATE).