Package Bio :: Module SeqFeature :: Class FeatureLocation
[hide private]
[frames] | no frames]

Class FeatureLocation

source code

object --+
         |
        FeatureLocation

Specify the location of a feature along a sequence.

The FeatureLocation is used for simple continuous features, which can be described as running from a start position to and end position (optionally with a strand and reference information). More complex locations made up from several non-continuous parts (e.g. a coding sequence made up of several exons) are described using a SeqFeature with a CompoundLocation.

Note that the start and end location numbering follow Python's scheme, thus a GenBank entry of 123..150 (one based counting) becomes a location of [122:150] (zero based counting).

>>> from Bio.SeqFeature import FeatureLocation
>>> f = FeatureLocation(122, 150)
>>> print(f)
[122:150]
>>> print(f.start)
122
>>> print(f.end)
150
>>> print(f.strand)
None

Note the strand defaults to None. If you are working with nucleotide sequences you'd want to be explicit if it is the forward strand:

>>> from Bio.SeqFeature import FeatureLocation
>>> f = FeatureLocation(122, 150, strand=+1)
>>> print(f)
[122:150](+)
>>> print(f.strand)
1

Note that for a parent sequence of length n, the FeatureLocation start and end must satisfy the inequality 0 <= start <= end <= n. This means even for features on the reverse strand of a nucleotide sequence, we expect the 'start' coordinate to be less than the 'end'.

>>> from Bio.SeqFeature import FeatureLocation
>>> r = FeatureLocation(122, 150, strand=-1)
>>> print(r)
[122:150](-)
>>> print(r.start)
122
>>> print(r.end)
150
>>> print(r.strand)
-1

i.e. Rather than thinking of the 'start' and 'end' biologically in a strand aware manor, think of them as the 'left most' or 'minimum' boundary, and the 'right most' or 'maximum' boundary of the region being described. This is particularly important with compound locations describing non-continuous regions.

In the example above we have used standard exact positions, but there are also specialised position objects used to represent fuzzy positions as well, for example a GenBank location like complement(<123..150) would use a BeforePosition object for the start.

Instance Methods [hide private]
 
__init__(self, start, end, strand=None, ref=None, ref_db=None)
Initialize the class.
source code
 
_get_strand(self)
Get function for the strand property (PRIVATE).
source code
 
_set_strand(self, value)
Set function for the strand property (PRIVATE).
source code
 
__str__(self)
Return a representation of the FeatureLocation object (with python counting).
source code
 
__repr__(self)
Represent the FeatureLocation object as a string for debugging.
source code
 
__add__(self, other)
Combine location with another FeatureLocation object, or shift it.
source code
 
__radd__(self, other)
Add a feature locationanother FeatureLocation object to the left.
source code
 
__nonzero__(self)
Return True regardless of the length of the feature.
source code
 
__len__(self)
Return the length of the region described by the FeatureLocation object.
source code
 
__contains__(self, value)
Check if an integer position is within the FeatureLocation object.
source code
 
__iter__(self)
Iterate over the parent positions within the FeatureLocation object.
source code
 
__eq__(self, other)
Implement equality by comparing all the location attributes.
source code
 
__ne__(self, other)
Implement the not-equal operand.
source code
 
_shift(self, offset)
Return a copy of the FeatureLocation shifted by an offset (PRIVATE).
source code
 
_flip(self, length)
Return a copy of the location after the parent is reversed (PRIVATE).
source code
 
extract(self, parent_sequence)
Extract the sequence from supplied parent sequence using the FeatureLocation object.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Properties [hide private]
  strand
Strand of the location (+1, -1, 0 or None).
  parts
Read only list of sections (always one, the FeatureLocation object).
  start
Start location - left most (minimum) value, regardless of strand.
  end
End location - right most (maximum) value, regardless of strand.
  nofuzzy_start
Start position (integer, approximated if fuzzy, read only) (OBSOLETE).
  nofuzzy_end
End position (integer, approximated if fuzzy, read only) (OBSOLETE).

Inherited from object: __class__

Method Details [hide private]

__init__(self, start, end, strand=None, ref=None, ref_db=None)
(Constructor)

source code 

Initialize the class.

start and end arguments specify the values where the feature begins and ends. These can either by any of the *Position objects that inherit from AbstractPosition, or can just be integers specifying the position. In the case of integers, the values are assumed to be exact and are converted in ExactPosition arguments. This is meant to make it easy to deal with non-fuzzy ends.

i.e. Short form:

>>> from Bio.SeqFeature import FeatureLocation
>>> loc = FeatureLocation(5, 10, strand=-1)
>>> print(loc)
[5:10](-)

Explicit form:

>>> from Bio.SeqFeature import FeatureLocation, ExactPosition
>>> loc = FeatureLocation(ExactPosition(5), ExactPosition(10), strand=-1)
>>> print(loc)
[5:10](-)

Other fuzzy positions are used similarly,

>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc2 = FeatureLocation(BeforePosition(5), AfterPosition(10), strand=-1)
>>> print(loc2)
[<5:>10](-)

For nucleotide features you will also want to specify the strand, use 1 for the forward (plus) strand, -1 for the reverse (negative) strand, 0 for stranded but strand unknown (? in GFF3), or None for when the strand does not apply (dot in GFF3), e.g. features on proteins.

>>> loc = FeatureLocation(5, 10, strand=+1)
>>> print(loc)
[5:10](+)
>>> print(loc.strand)
1

Normally feature locations are given relative to the parent sequence you are working with, but an explicit accession can be given with the optional ref and db_ref strings:

>>> loc = FeatureLocation(105172, 108462, ref="AL391218.9", strand=1)
>>> print(loc)
AL391218.9[105172:108462](+)
>>> print(loc.ref)
AL391218.9
Overrides: object.__init__

__str__(self)
(Informal representation operator)

source code 

Return a representation of the FeatureLocation object (with python counting).

For the simple case this uses the python splicing syntax, [122:150] (zero based counting) which GenBank would call 123..150 (one based counting).

Overrides: object.__str__

__repr__(self)
(Representation operator)

source code 
Represent the FeatureLocation object as a string for debugging.
Overrides: object.__repr__

__add__(self, other)
(Addition operator)

source code 

Combine location with another FeatureLocation object, or shift it.

You can add two feature locations to make a join CompoundLocation:

>>> from Bio.SeqFeature import FeatureLocation
>>> f1 = FeatureLocation(5, 10)
>>> f2 = FeatureLocation(20, 30)
>>> combined = f1 + f2
>>> print(combined)
join{[5:10], [20:30]}

This is thus equivalent to:

>>> from Bio.SeqFeature import CompoundLocation
>>> join = CompoundLocation([f1, f2])
>>> print(join)
join{[5:10], [20:30]}

You can also use sum(...) in this way:

>>> join = sum([f1, f2])
>>> print(join)
join{[5:10], [20:30]}

Furthermore, you can combine a FeatureLocation with a CompoundLocation in this way.

Separately, adding an integer will give a new FeatureLocation with its start and end offset by that amount. For example:

>>> print(f1)
[5:10]
>>> print(f1 + 100)
[105:110]
>>> print(200 + f1)
[205:210]

This can be useful when editing annotation.

__nonzero__(self)
(Boolean test operator)

source code 

Return True regardless of the length of the feature.

This behaviour is for backwards compatibility, since until the __len__ method was added, a FeatureLocation always evaluated as True.

Note that in comparison, Seq objects, strings, lists, etc, will all evaluate to False if they have length zero.

WARNING: The FeatureLocation may in future evaluate to False when its length is zero (in order to better match normal python behaviour)!

__len__(self)
(Length operator)

source code 

Return the length of the region described by the FeatureLocation object.

Note that extra care may be needed for fuzzy locations, e.g.

>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10))
>>> len(loc)
5

__contains__(self, value)
(In operator)

source code 

Check if an integer position is within the FeatureLocation object.

Note that extra care may be needed for fuzzy locations, e.g.

>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10))
>>> len(loc)
5
>>> [i for i in range(15) if i in loc]
[5, 6, 7, 8, 9]

__iter__(self)

source code 

Iterate over the parent positions within the FeatureLocation object.

>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10))
>>> len(loc)
5
>>> for i in loc: print(i)
5
6
7
8
9
>>> list(loc)
[5, 6, 7, 8, 9]
>>> [i for i in range(15) if i in loc]
[5, 6, 7, 8, 9]

Note this is strand aware:

>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10), strand = -1)
>>> list(loc)
[9, 8, 7, 6, 5]

extract(self, parent_sequence)

source code 

Extract the sequence from supplied parent sequence using the FeatureLocation object.

The parent_sequence can be a Seq like object or a string, and will generally return an object of the same type. The exception to this is a MutableSeq as the parent sequence will return a Seq object.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> from Bio.SeqFeature import FeatureLocation
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> feature_loc = FeatureLocation(8, 15)
>>> feature_loc.extract(seq)
Seq('VALIVIC', ProteinAlphabet())

Property Details [hide private]

strand

Strand of the location (+1, -1, 0 or None).
Get Method:
_get_strand(self) - Get function for the strand property (PRIVATE).
Set Method:
_set_strand(self, value) - Set function for the strand property (PRIVATE).

parts

Read only list of sections (always one, the FeatureLocation object).

This is a convenience property allowing you to write code handling both simple FeatureLocation objects (with one part) and more complex CompoundLocation objects (with multiple parts) interchangeably.

Get Method:
unreachable.parts(self) - Read only list of sections (always one, the FeatureLocation object).

start

Start location - left most (minimum) value, regardless of strand.

Read only, returns an integer like position object, possibly a fuzzy position.

Get Method:
unreachable.start(self) - Start location - left most (minimum) value, regardless of strand.

end

End location - right most (maximum) value, regardless of strand.

Read only, returns an integer like position object, possibly a fuzzy position.

Get Method:
unreachable.end(self) - End location - right most (maximum) value, regardless of strand.

nofuzzy_start

Start position (integer, approximated if fuzzy, read only) (OBSOLETE).

This is now an alias for int(feature.start), which should be used in preference -- unless you are trying to support old versions of Biopython.

Get Method:
unreachable.nofuzzy_start(self) - Start position (integer, approximated if fuzzy, read only) (OBSOLETE).

nofuzzy_end

End position (integer, approximated if fuzzy, read only) (OBSOLETE).

This is now an alias for int(feature.end), which should be used in preference -- unless you are trying to support old versions of Biopython.

Get Method:
unreachable.nofuzzy_end(self) - End position (integer, approximated if fuzzy, read only) (OBSOLETE).