Package Bio :: Module SeqFeature :: Class FeatureLocation
[hide private]
[frames] | no frames]

Class FeatureLocation

source code

object --+
         |
        FeatureLocation

Specify the location of a feature along a sequence.

The FeatureLocation is used for simple continous features, which can be described as running from a start position to and end position (optionally with a strand and reference information). More complex locations made up from several non-continuous parts (e.g. a coding sequence made up of several exons) are currently described using a SeqFeature with sub-features.

Note that the start and end location numbering follow Python's scheme, thus a GenBank entry of 123..150 (one based counting) becomes a location of [122:150] (zero based counting).

>>> from Bio.SeqFeature import FeatureLocation
>>> f = FeatureLocation(122, 150)
>>> print(f)
[122:150]
>>> print(f.start)
122
>>> print(f.end)
150
>>> print(f.strand)
None

Note the strand defaults to None. If you are working with nucleotide sequences you'd want to be explicit if it is the forward strand:

>>> from Bio.SeqFeature import FeatureLocation
>>> f = FeatureLocation(122, 150, strand=+1)
>>> print(f)
[122:150](+)
>>> print(f.strand)
1

Note that for a parent sequence of length n, the FeatureLocation start and end must satisfy the inequality 0 <= start <= end <= n. This means even for features on the reverse strand of a nucleotide sequence, we expect the 'start' coordinate to be less than the 'end'.

>>> from Bio.SeqFeature import FeatureLocation
>>> r = FeatureLocation(122, 150, strand=-1)
>>> print(r)
[122:150](-)
>>> print(r.start)
122
>>> print(r.end)
150
>>> print(r.strand)
-1

i.e. Rather than thinking of the 'start' and 'end' biologically in a strand aware manor, think of them as the 'left most' or 'minimum' boundary, and the 'right most' or 'maximum' boundary of the region being described. This is particularly important with compound locations describing non-continuous regions.

In the example above we have used standard exact positions, but there are also specialised position objects used to represent fuzzy positions as well, for example a GenBank location like complement(<123..150) would use a BeforePosition object for the start.

Instance Methods [hide private]
 
__init__(self, start, end, strand=None, ref=None, ref_db=None)
Specify the start, end, strand etc of a sequence feature.
source code
 
_get_strand(self) source code
 
_set_strand(self, value) source code
 
__str__(self)
Returns a representation of the location (with python counting).
source code
 
__repr__(self)
A string representation of the location for debugging.
source code
 
__add__(self, other)
Combine location with another feature location, or shift it.
source code
 
__radd__(self, other) source code
 
__nonzero__(self)
Returns True regardless of the length of the feature.
source code
 
__len__(self)
Returns the length of the region described by the FeatureLocation.
source code
 
__contains__(self, value)
Check if an integer position is within the FeatureLocation.
source code
 
__iter__(self)
Iterate over the parent positions within the FeatureLocation.
source code
 
_shift(self, offset)
Returns a copy of the location shifted by the offset (PRIVATE).
source code
 
_flip(self, length)
Returns a copy of the location after the parent is reversed (PRIVATE).
source code
 
extract(self, parent_sequence)
Extract feature sequence from the supplied parent sequence.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Properties [hide private]
  strand
Strand of the location (+1, -1, 0 or None).
  parts
Read only list of parts (always one, the Feature Location).
  start
Start location (integer like, possibly a fuzzy position, read only).
  end
End location (integer like, possibly a fuzzy position, read only).
  nofuzzy_start
Start position (integer, approximated if fuzzy, read only) (OBSOLETE).
  nofuzzy_end
End position (integer, approximated if fuzzy, read only) (OBSOLETE).

Inherited from object: __class__

Method Details [hide private]

__init__(self, start, end, strand=None, ref=None, ref_db=None)
(Constructor)

source code 

Specify the start, end, strand etc of a sequence feature.

start and end arguments specify the values where the feature begins and ends. These can either by any of the *Position objects that inherit from AbstractPosition, or can just be integers specifying the position. In the case of integers, the values are assumed to be exact and are converted in ExactPosition arguments. This is meant to make it easy to deal with non-fuzzy ends.

i.e. Short form:

>>> from Bio.SeqFeature import FeatureLocation
>>> loc = FeatureLocation(5, 10, strand=-1)
>>> print(loc)
[5:10](-)

Explicit form:

>>> from Bio.SeqFeature import FeatureLocation, ExactPosition
>>> loc = FeatureLocation(ExactPosition(5), ExactPosition(10), strand=-1)
>>> print(loc)
[5:10](-)

Other fuzzy positions are used similarly,

>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc2 = FeatureLocation(BeforePosition(5), AfterPosition(10), strand=-1)
>>> print(loc2)
[<5:>10](-)

For nucleotide features you will also want to specify the strand, use 1 for the forward (plus) strand, -1 for the reverse (negative) strand, 0 for stranded but strand unknown (? in GFF3), or None for when the strand does not apply (dot in GFF3), e.g. features on proteins.

>>> loc = FeatureLocation(5, 10, strand=+1)
>>> print(loc)
[5:10](+)
>>> print(loc.strand)
1

Normally feature locations are given relative to the parent sequence you are working with, but an explicit accession can be given with the optional ref and db_ref strings:

>>> loc = FeatureLocation(105172, 108462, ref="AL391218.9", strand=1)
>>> print(loc)
AL391218.9[105172:108462](+)
>>> print(loc.ref)
AL391218.9
Overrides: object.__init__

__str__(self)
(Informal representation operator)

source code 

Returns a representation of the location (with python counting).

For the simple case this uses the python splicing syntax, [122:150] (zero based counting) which GenBank would call 123..150 (one based counting).

Overrides: object.__str__

__repr__(self)
(Representation operator)

source code 
A string representation of the location for debugging.
Overrides: object.__repr__

__add__(self, other)
(Addition operator)

source code 

Combine location with another feature location, or shift it.

You can add two feature locations to make a join CompoundLocation:

>>> from Bio.SeqFeature import FeatureLocation
>>> f1 = FeatureLocation(5, 10)
>>> f2 = FeatureLocation(20, 30)
>>> combined = f1 + f2
>>> print(combined)
join{[5:10], [20:30]}

This is thus equivalent to:

>>> from Bio.SeqFeature import CompoundLocation
>>> join = CompoundLocation([f1, f2])
>>> print(join)
join{[5:10], [20:30]}

You can also use sum(...) in this way:

>>> join = sum([f1, f2])
>>> print(join)
join{[5:10], [20:30]}

Furthermore, you can combine a FeatureLocation with a CompoundLocation in this way.

Separately, adding an integer will give a new FeatureLocation with its start and end offset by that amount. For example:

>>> print(f1)
[5:10]
>>> print(f1 + 100)
[105:110]
>>> print(200 + f1)
[205:210]

This can be useful when editing annotation.

__nonzero__(self)
(Boolean test operator)

source code 

Returns True regardless of the length of the feature.

This behaviour is for backwards compatibility, since until the __len__ method was added, a FeatureLocation always evaluated as True.

Note that in comparison, Seq objects, strings, lists, etc, will all evaluate to False if they have length zero.

WARNING: The FeatureLocation may in future evaluate to False when its length is zero (in order to better match normal python behaviour)!

__len__(self)
(Length operator)

source code 

Returns the length of the region described by the FeatureLocation.

Note that extra care may be needed for fuzzy locations, e.g.

>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10))
>>> len(loc)
5

__contains__(self, value)
(In operator)

source code 

Check if an integer position is within the FeatureLocation.

Note that extra care may be needed for fuzzy locations, e.g.

>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10))
>>> len(loc)
5
>>> [i for i in range(15) if i in loc]
[5, 6, 7, 8, 9]

__iter__(self)

source code 

Iterate over the parent positions within the FeatureLocation.

>>> from Bio.SeqFeature import FeatureLocation
>>> from Bio.SeqFeature import BeforePosition, AfterPosition
>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10))
>>> len(loc)
5
>>> for i in loc: print(i)
5
6
7
8
9
>>> list(loc)
[5, 6, 7, 8, 9]
>>> [i for i in range(15) if i in loc]
[5, 6, 7, 8, 9]

Note this is strand aware:

>>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10), strand = -1)
>>> list(loc)
[9, 8, 7, 6, 5]

Property Details [hide private]

strand

Strand of the location (+1, -1, 0 or None).
Get Method:
_get_strand(self)
Set Method:
_set_strand(self, value)

parts

Read only list of parts (always one, the Feature Location).

This is a convience property allowing you to write code handling both simple FeatureLocation objects (with one part) and more complex CompoundLocation objects (with multiple parts) interchangably.

Get Method:
unreachable.parts(self) - Read only list of parts (always one, the Feature Location).

start

Start location (integer like, possibly a fuzzy position, read only).
Get Method:
unreachable.start(self) - Start location (integer like, possibly a fuzzy position, read only).

end

End location (integer like, possibly a fuzzy position, read only).
Get Method:
unreachable.end(self) - End location (integer like, possibly a fuzzy position, read only).

nofuzzy_start

Start position (integer, approximated if fuzzy, read only) (OBSOLETE).

This is now an alias for int(feature.start), which should be used in preference -- unless you are trying to support old versions of Biopython.

Get Method:
unreachable.nofuzzy_start(self) - Start position (integer, approximated if fuzzy, read only) (OBSOLETE).

nofuzzy_end

End position (integer, approximated if fuzzy, read only) (OBSOLETE).

This is now an alias for int(feature.end), which should be used in preference -- unless you are trying to support old versions of Biopython.

Get Method:
unreachable.nofuzzy_end(self) - End position (integer, approximated if fuzzy, read only) (OBSOLETE).