Package Bio :: Module SeqFeature
[hide private]
[frames] | no frames]

Source Code for Module Bio.SeqFeature

   1  # Copyright 2000-2003 Jeff Chang. 
   2  # Copyright 2001-2008 Brad Chapman. 
   3  # Copyright 2005-2016 by Peter Cock. 
   4  # Copyright 2006-2009 Michiel de Hoon. 
   5  # All rights reserved. 
   6  # This code is part of the Biopython distribution and governed by its 
   7  # license.  Please see the LICENSE file that should have been included 
   8  # as part of this package. 
   9  """Represent a Sequence Feature holding info about a part of a sequence. 
  10   
  11  This is heavily modeled after the Biocorba SeqFeature objects, and 
  12  may be pretty biased towards GenBank stuff since I'm writing it 
  13  for the GenBank parser output... 
  14   
  15  What's here: 
  16   
  17  Base class to hold a Feature 
  18  ---------------------------- 
  19   
  20  Classes: 
  21   - SeqFeature 
  22   
  23  Hold information about a Reference 
  24  ---------------------------------- 
  25   
  26  This is an attempt to create a General class to hold Reference type 
  27  information. 
  28   
  29  Classes: 
  30   - Reference 
  31   
  32  Specify locations of a feature on a Sequence 
  33  -------------------------------------------- 
  34   
  35  This aims to handle, in Ewan Birney's words, 'the dreaded fuzziness issue'. 
  36  This has the advantages of allowing us to handle fuzzy stuff in case anyone 
  37  needs it, and also be compatible with BioPerl etc and BioSQL. 
  38   
  39  Classes: 
  40   - FeatureLocation - Specify the start and end location of a feature. 
  41   - CompoundLocation - Collection of FeatureLocation objects (for joins etc). 
  42   - ExactPosition - Specify the position as being exact. 
  43   - WithinPosition - Specify a position occurring within some range. 
  44   - BetweenPosition - Specify a position occurring between a range (OBSOLETE?). 
  45   - BeforePosition - Specify the position as being found before some base. 
  46   - AfterPosition - Specify the position as being found after some base. 
  47   - OneOfPosition - Specify a position where the location can be multiple positions. 
  48   - UnknownPosition - Represents missing information like '?' in UniProt. 
  49   
  50  """ 
  51   
  52  from __future__ import print_function 
  53   
  54  from collections import OrderedDict 
  55   
  56  from Bio._py3k import _is_int_or_long 
  57   
  58  from Bio.Seq import MutableSeq, reverse_complement 
59 60 61 -class SeqFeature(object):
62 """Represent a Sequence Feature on an object. 63 64 Attributes: 65 - location - the location of the feature on the sequence (FeatureLocation) 66 - type - the specified type of the feature (ie. CDS, exon, repeat...) 67 - location_operator - a string specifying how this SeqFeature may 68 be related to others. For example, in the example GenBank feature 69 shown below, the location_operator would be "join". This is a proxy 70 for feature.location.operator and only applies to compound locations. 71 - strand - A value specifying on which strand (of a DNA sequence, for 72 instance) the feature deals with. 1 indicates the plus strand, -1 73 indicates the minus strand, 0 indicates stranded but unknown (? in GFF3), 74 while the default of None indicates that strand doesn't apply (dot in GFF3, 75 e.g. features on proteins). Note this is a shortcut for accessing the 76 strand property of the feature's location. 77 - id - A string identifier for the feature. 78 - ref - A reference to another sequence. This could be an accession 79 number for some different sequence. Note this is a shortcut for the 80 reference property of the feature's location. 81 - ref_db - A different database for the reference accession number. 82 Note this is a shortcut for the reference property of the location 83 - qualifiers - A dictionary of qualifiers on the feature. These are 84 analogous to the qualifiers from a GenBank feature table. The keys of 85 the dictionary are qualifier names, the values are the qualifier 86 values. As of Biopython 1.69 this is an ordered dictionary. 87 88 """ 89
90 - def __init__(self, location=None, type='', location_operator='', 91 strand=None, id="<unknown id>", 92 qualifiers=None, sub_features=None, 93 ref=None, ref_db=None):
94 """Initialize a SeqFeature on a Sequence. 95 96 location can either be a FeatureLocation (with strand argument also 97 given if required), or None. 98 99 e.g. With no strand, on the forward strand, and on the reverse strand: 100 101 >>> from Bio.SeqFeature import SeqFeature, FeatureLocation 102 >>> f1 = SeqFeature(FeatureLocation(5, 10), type="domain") 103 >>> f1.strand == f1.location.strand == None 104 True 105 >>> f2 = SeqFeature(FeatureLocation(7, 110, strand=1), type="CDS") 106 >>> f2.strand == f2.location.strand == +1 107 True 108 >>> f3 = SeqFeature(FeatureLocation(9, 108, strand=-1), type="CDS") 109 >>> f3.strand == f3.location.strand == -1 110 True 111 112 An invalid strand will trigger an exception: 113 114 >>> f4 = SeqFeature(FeatureLocation(50, 60), strand=2) 115 Traceback (most recent call last): 116 ... 117 ValueError: Strand should be +1, -1, 0 or None, not 2 118 119 Similarly if set via the FeatureLocation directly: 120 121 >>> loc4 = FeatureLocation(50, 60, strand=2) 122 Traceback (most recent call last): 123 ... 124 ValueError: Strand should be +1, -1, 0 or None, not 2 125 126 For exact start/end positions, an integer can be used (as shown above) 127 as shorthand for the ExactPosition object. For non-exact locations, the 128 FeatureLocation must be specified via the appropriate position objects. 129 130 Note that the strand, ref and ref_db arguments to the SeqFeature are 131 now obsolete and will be deprecated in a future release (which will 132 give warning messages) and later removed. Set them via the location 133 object instead. 134 135 Note that location_operator and sub_features arguments can no longer 136 be used, instead do this via the CompoundLocation object. 137 """ 138 if location is not None and not isinstance(location, FeatureLocation) \ 139 and not isinstance(location, CompoundLocation): 140 raise TypeError( 141 "FeatureLocation, CompoundLocation (or None) required for the location") 142 self.location = location 143 self.type = type 144 if location_operator: 145 # TODO - Deprecation warning 146 self.location_operator = location_operator 147 if strand is not None: 148 # TODO - Deprecation warning 149 self.strand = strand 150 self.id = id 151 if qualifiers is None: 152 qualifiers = OrderedDict() 153 self.qualifiers = qualifiers 154 if sub_features is not None: 155 raise TypeError("Rather than sub_features, use a CompoundFeatureLocation") 156 if ref is not None: 157 # TODO - Deprecation warning 158 self.ref = ref 159 if ref_db is not None: 160 # TODO - Deprecation warning 161 self.ref_db = ref_db
162
163 - def _get_strand(self):
164 """Get function for the strand property (PRIVATE).""" 165 return self.location.strand
166
167 - def _set_strand(self, value):
168 """Set function for the strand property (PRIVATE).""" 169 try: 170 self.location.strand = value 171 except AttributeError: 172 if self.location is None: 173 if value is not None: 174 raise ValueError("Can't set strand without a location.") 175 else: 176 raise
177 178 strand = property(fget=_get_strand, fset=_set_strand, 179 doc="""Feature's strand 180 181 This is a shortcut for feature.location.strand 182 """) 183
184 - def _get_ref(self):
185 """Get function for the reference property (PRIVATE).""" 186 try: 187 return self.location.ref 188 except AttributeError: 189 return None
190
191 - def _set_ref(self, value):
192 """Set function for the reference property (PRIVATE).""" 193 try: 194 self.location.ref = value 195 except AttributeError: 196 if self.location is None: 197 if value is not None: 198 raise ValueError("Can't set ref without a location.") 199 else: 200 raise
201 ref = property(fget=_get_ref, fset=_set_ref, 202 doc="""Feature location reference (e.g. accession). 203 204 This is a shortcut for feature.location.ref 205 """) 206
207 - def _get_ref_db(self):
208 """Get function for the database reference property (PRIVATE).""" 209 try: 210 return self.location.ref_db 211 except AttributeError: 212 return None
213
214 - def _set_ref_db(self, value):
215 """Set function for the database reference property (PRIVATE).""" 216 self.location.ref_db = value
217 ref_db = property(fget=_get_ref_db, fset=_set_ref_db, 218 doc="""Feature location reference's database. 219 220 This is a shortcut for feature.location.ref_db 221 """) 222
223 - def _get_location_operator(self):
224 """Get function for the location operator property (PRIVATE).""" 225 try: 226 return self.location.operator 227 except AttributeError: 228 return None
229
230 - def _set_location_operator(self, value):
231 """Set function for the location operator property.""" 232 if value: 233 if isinstance(self.location, CompoundLocation): 234 self.location.operator = value 235 elif self.location is None: 236 raise ValueError( 237 "Location is None so can't set its operator (to %r)" % value) 238 else: 239 raise ValueError( 240 "Only CompoundLocation gets an operator (%r)" % value)
241 location_operator = property(fget=_get_location_operator, fset=_set_location_operator, 242 doc="Location operator for compound locations (e.g. join).") 243
244 - def __repr__(self):
245 """Represent the feature as a string for debugging.""" 246 answer = "%s(%s" % (self.__class__.__name__, repr(self.location)) 247 if self.type: 248 answer += ", type=%s" % repr(self.type) 249 if self.location_operator: 250 answer += ", location_operator=%s" % repr(self.location_operator) 251 if self.id and self.id != "<unknown id>": 252 answer += ", id=%s" % repr(self.id) 253 if self.ref: 254 answer += ", ref=%s" % repr(self.ref) 255 if self.ref_db: 256 answer += ", ref_db=%s" % repr(self.ref_db) 257 answer += ")" 258 return answer
259
260 - def __str__(self):
261 """Return the full feature as a python string.""" 262 out = "type: %s\n" % self.type 263 out += "location: %s\n" % self.location 264 if self.id and self.id != "<unknown id>": 265 out += "id: %s\n" % self.id 266 out += "qualifiers:\n" 267 for qual_key in sorted(self.qualifiers): 268 out += " Key: %s, Value: %s\n" % (qual_key, 269 self.qualifiers[qual_key]) 270 return out
271
272 - def _shift(self, offset):
273 """Return a copy of the feature with its location shifted (PRIVATE). 274 275 The annotation qaulifiers are copied. 276 """ 277 return SeqFeature(location=self.location._shift(offset), 278 type=self.type, 279 location_operator=self.location_operator, 280 id=self.id, 281 qualifiers=OrderedDict(self.qualifiers.items()))
282
283 - def _flip(self, length):
284 """Return a copy of the feature with its location flipped (PRIVATE). 285 286 The argument length gives the length of the parent sequence. For 287 example a location 0..20 (+1 strand) with parent length 30 becomes 288 after flipping 10..30 (-1 strand). Strandless (None) or unknown 289 strand (0) remain like that - just their end points are changed. 290 291 The annotation qaulifiers are copied. 292 """ 293 return SeqFeature(location=self.location._flip(length), 294 type=self.type, 295 location_operator=self.location_operator, 296 id=self.id, 297 qualifiers=OrderedDict(self.qualifiers.items()))
298
299 - def extract(self, parent_sequence):
300 """Extract the feature's sequence from supplied parent sequence. 301 302 The parent_sequence can be a Seq like object or a string, and will 303 generally return an object of the same type. The exception to this is 304 a MutableSeq as the parent sequence will return a Seq object. 305 306 This should cope with complex locations including complements, joins 307 and fuzzy positions. Even mixed strand features should work! This 308 also covers features on protein sequences (e.g. domains), although 309 here reverse strand features are not permitted. 310 311 >>> from Bio.Seq import Seq 312 >>> from Bio.Alphabet import generic_protein 313 >>> from Bio.SeqFeature import SeqFeature, FeatureLocation 314 >>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein) 315 >>> f = SeqFeature(FeatureLocation(8, 15), type="domain") 316 >>> f.extract(seq) 317 Seq('VALIVIC', ProteinAlphabet()) 318 319 If the FeatureLocation is None, e.g. when parsing invalid locus 320 locations in the GenBank parser, extract() will raise a ValueError. 321 322 >>> from Bio.Seq import Seq 323 >>> from Bio.SeqFeature import SeqFeature 324 >>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein) 325 >>> f = SeqFeature(None, type="domain") 326 >>> f.extract(seq) 327 Traceback (most recent call last): 328 ... 329 ValueError: The feature's .location is None. Check the sequence file for a valid location. 330 331 Note - currently only compound features of type "join" are supported. 332 """ 333 if self.location is None: 334 raise ValueError("The feature's .location is None. Check the " 335 "sequence file for a valid location.") 336 return self.location.extract(parent_sequence)
337 338 # Python 3:
339 - def __bool__(self):
340 """Boolean value of an instance of this class (True). 341 342 This behaviour is for backwards compatibility, since until the 343 __len__ method was added, a SeqFeature always evaluated as True. 344 345 Note that in comparison, Seq objects, strings, lists, etc, will all 346 evaluate to False if they have length zero. 347 348 WARNING: The SeqFeature may in future evaluate to False when its 349 length is zero (in order to better match normal python behaviour)! 350 """ 351 return True
352 353 # Python 2: 354 __nonzero__ = __bool__ 355
356 - def __len__(self):
357 """Return the length of the region where the feature is located. 358 359 >>> from Bio.Seq import Seq 360 >>> from Bio.Alphabet import generic_protein 361 >>> from Bio.SeqFeature import SeqFeature, FeatureLocation 362 >>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein) 363 >>> f = SeqFeature(FeatureLocation(8, 15), type="domain") 364 >>> len(f) 365 7 366 >>> f.extract(seq) 367 Seq('VALIVIC', ProteinAlphabet()) 368 >>> len(f.extract(seq)) 369 7 370 371 This is a proxy for taking the length of the feature's location: 372 373 >>> len(f.location) 374 7 375 376 For simple features this is the same as the region spanned (end 377 position minus start position using Pythonic counting). However, for 378 a compound location (e.g. a CDS as the join of several exons) the 379 gaps are not counted (e.g. introns). This ensures that len(f) matches 380 len(f.extract(parent_seq)), and also makes sure things work properly 381 with features wrapping the origin etc. 382 """ 383 return len(self.location)
384
385 - def __iter__(self):
386 """Iterate over the parent positions within the feature. 387 388 The iteration order is strand aware, and can be thought of as moving 389 along the feature using the parent sequence coordinates: 390 391 >>> from Bio.SeqFeature import SeqFeature, FeatureLocation 392 >>> f = SeqFeature(FeatureLocation(5, 10), type="domain", strand=-1) 393 >>> len(f) 394 5 395 >>> for i in f: print(i) 396 9 397 8 398 7 399 6 400 5 401 >>> list(f) 402 [9, 8, 7, 6, 5] 403 404 This is a proxy for iterating over the location, 405 406 >>> list(f.location) 407 [9, 8, 7, 6, 5] 408 """ 409 return iter(self.location)
410
411 - def __contains__(self, value):
412 """Check if an integer position is within the feature. 413 414 >>> from Bio.SeqFeature import SeqFeature, FeatureLocation 415 >>> f = SeqFeature(FeatureLocation(5, 10), type="domain", strand=-1) 416 >>> len(f) 417 5 418 >>> [i for i in range(15) if i in f] 419 [5, 6, 7, 8, 9] 420 421 For example, to see which features include a SNP position, you could 422 use this: 423 424 >>> from Bio import SeqIO 425 >>> record = SeqIO.read("GenBank/NC_000932.gb", "gb") 426 >>> for f in record.features: 427 ... if 1750 in f: 428 ... print("%s %s" % (f.type, f.location)) 429 source [0:154478](+) 430 gene [1716:4347](-) 431 tRNA join{[4310:4347](-), [1716:1751](-)} 432 433 Note that for a feature defined as a join of several subfeatures (e.g. 434 the union of several exons) the gaps are not checked (e.g. introns). 435 In this example, the tRNA location is defined in the GenBank file as 436 complement(join(1717..1751,4311..4347)), so that position 1760 falls 437 in the gap: 438 439 >>> for f in record.features: 440 ... if 1760 in f: 441 ... print("%s %s" % (f.type, f.location)) 442 source [0:154478](+) 443 gene [1716:4347](-) 444 445 Note that additional care may be required with fuzzy locations, for 446 example just before a BeforePosition: 447 448 >>> from Bio.SeqFeature import SeqFeature, FeatureLocation 449 >>> from Bio.SeqFeature import BeforePosition 450 >>> f = SeqFeature(FeatureLocation(BeforePosition(3), 8), type="domain") 451 >>> len(f) 452 5 453 >>> [i for i in range(10) if i in f] 454 [3, 4, 5, 6, 7] 455 456 Note that is is a proxy for testing membership on the location. 457 458 >>> [i for i in range(10) if i in f.location] 459 [3, 4, 5, 6, 7] 460 """ 461 return value in self.location
462
463 464 # --- References 465 466 467 # TODO -- Will this hold PubMed and Medline information decently? 468 -class Reference(object):
469 """Represent a Generic Reference object. 470 471 Attributes: 472 - location - A list of Location objects specifying regions of 473 the sequence that the references correspond to. If no locations are 474 specified, the entire sequence is assumed. 475 - authors - A big old string, or a list split by author, of authors 476 for the reference. 477 - title - The title of the reference. 478 - journal - Journal the reference was published in. 479 - medline_id - A medline reference for the article. 480 - pubmed_id - A pubmed reference for the article. 481 - comment - A place to stick any comments about the reference. 482 483 """ 484
485 - def __init__(self):
486 """Initialize the class.""" 487 self.location = [] 488 self.authors = '' 489 self.consrtm = '' 490 self.title = '' 491 self.journal = '' 492 self.medline_id = '' 493 self.pubmed_id = '' 494 self.comment = ''
495
496 - def __str__(self):
497 """Return the full Reference object as a python string.""" 498 out = "" 499 for single_location in self.location: 500 out += "location: %s\n" % single_location 501 out += "authors: %s\n" % self.authors 502 if self.consrtm: 503 out += "consrtm: %s\n" % self.consrtm 504 out += "title: %s\n" % self.title 505 out += "journal: %s\n" % self.journal 506 out += "medline id: %s\n" % self.medline_id 507 out += "pubmed id: %s\n" % self.pubmed_id 508 out += "comment: %s\n" % self.comment 509 return out
510
511 - def __repr__(self):
512 """Represent the Reference object as a string for debugging.""" 513 # TODO - Update this is __init__ later accpets values 514 return "%s(title=%s, ...)" % (self.__class__.__name__, 515 repr(self.title))
516
517 - def __eq__(self, other):
518 """Check if two Reference objects should be considered equal. 519 520 Note prior to Biopython 1.70 the location was not compared, as 521 until then __eq__ for the FeatureLocation class was not defined. 522 """ 523 return self.authors == other.authors and \ 524 self.consrtm == other.consrtm and \ 525 self.title == other.title and \ 526 self.journal == other.journal and \ 527 self.medline_id == other.medline_id and \ 528 self.pubmed_id == other.pubmed_id and \ 529 self.comment == other.comment and \ 530 self.location == other.location
531
532 - def __ne__(self, other):
533 """Implement the not-equal operand.""" 534 # This is needed for py2, but not for py3. 535 return not self == other
536
537 538 # --- Handling feature locations 539 540 -class FeatureLocation(object):
541 """Specify the location of a feature along a sequence. 542 543 The FeatureLocation is used for simple continuous features, which can 544 be described as running from a start position to and end position 545 (optionally with a strand and reference information). More complex 546 locations made up from several non-continuous parts (e.g. a coding 547 sequence made up of several exons) are described using a SeqFeature 548 with a CompoundLocation. 549 550 Note that the start and end location numbering follow Python's scheme, 551 thus a GenBank entry of 123..150 (one based counting) becomes a location 552 of [122:150] (zero based counting). 553 554 >>> from Bio.SeqFeature import FeatureLocation 555 >>> f = FeatureLocation(122, 150) 556 >>> print(f) 557 [122:150] 558 >>> print(f.start) 559 122 560 >>> print(f.end) 561 150 562 >>> print(f.strand) 563 None 564 565 Note the strand defaults to None. If you are working with nucleotide 566 sequences you'd want to be explicit if it is the forward strand: 567 568 >>> from Bio.SeqFeature import FeatureLocation 569 >>> f = FeatureLocation(122, 150, strand=+1) 570 >>> print(f) 571 [122:150](+) 572 >>> print(f.strand) 573 1 574 575 Note that for a parent sequence of length n, the FeatureLocation 576 start and end must satisfy the inequality 0 <= start <= end <= n. 577 This means even for features on the reverse strand of a nucleotide 578 sequence, we expect the 'start' coordinate to be less than the 579 'end'. 580 581 >>> from Bio.SeqFeature import FeatureLocation 582 >>> r = FeatureLocation(122, 150, strand=-1) 583 >>> print(r) 584 [122:150](-) 585 >>> print(r.start) 586 122 587 >>> print(r.end) 588 150 589 >>> print(r.strand) 590 -1 591 592 i.e. Rather than thinking of the 'start' and 'end' biologically in a 593 strand aware manor, think of them as the 'left most' or 'minimum' 594 boundary, and the 'right most' or 'maximum' boundary of the region 595 being described. This is particularly important with compound 596 locations describing non-continuous regions. 597 598 In the example above we have used standard exact positions, but there 599 are also specialised position objects used to represent fuzzy positions 600 as well, for example a GenBank location like complement(<123..150) 601 would use a BeforePosition object for the start. 602 """ 603
604 - def __init__(self, start, end, strand=None, ref=None, ref_db=None):
605 """Initialize the class. 606 607 start and end arguments specify the values where the feature begins 608 and ends. These can either by any of the ``*Position`` objects that 609 inherit from AbstractPosition, or can just be integers specifying the 610 position. In the case of integers, the values are assumed to be 611 exact and are converted in ExactPosition arguments. This is meant 612 to make it easy to deal with non-fuzzy ends. 613 614 i.e. Short form: 615 616 >>> from Bio.SeqFeature import FeatureLocation 617 >>> loc = FeatureLocation(5, 10, strand=-1) 618 >>> print(loc) 619 [5:10](-) 620 621 Explicit form: 622 623 >>> from Bio.SeqFeature import FeatureLocation, ExactPosition 624 >>> loc = FeatureLocation(ExactPosition(5), ExactPosition(10), strand=-1) 625 >>> print(loc) 626 [5:10](-) 627 628 Other fuzzy positions are used similarly, 629 630 >>> from Bio.SeqFeature import FeatureLocation 631 >>> from Bio.SeqFeature import BeforePosition, AfterPosition 632 >>> loc2 = FeatureLocation(BeforePosition(5), AfterPosition(10), strand=-1) 633 >>> print(loc2) 634 [<5:>10](-) 635 636 For nucleotide features you will also want to specify the strand, 637 use 1 for the forward (plus) strand, -1 for the reverse (negative) 638 strand, 0 for stranded but strand unknown (? in GFF3), or None for 639 when the strand does not apply (dot in GFF3), e.g. features on 640 proteins. 641 642 >>> loc = FeatureLocation(5, 10, strand=+1) 643 >>> print(loc) 644 [5:10](+) 645 >>> print(loc.strand) 646 1 647 648 Normally feature locations are given relative to the parent 649 sequence you are working with, but an explicit accession can 650 be given with the optional ref and db_ref strings: 651 652 >>> loc = FeatureLocation(105172, 108462, ref="AL391218.9", strand=1) 653 >>> print(loc) 654 AL391218.9[105172:108462](+) 655 >>> print(loc.ref) 656 AL391218.9 657 658 """ 659 # TODO - Check 0 <= start <= end (<= length of reference) 660 if isinstance(start, AbstractPosition): 661 self._start = start 662 elif _is_int_or_long(start): 663 self._start = ExactPosition(start) 664 else: 665 raise TypeError("start=%r %s" % (start, type(start))) 666 if isinstance(end, AbstractPosition): 667 self._end = end 668 elif _is_int_or_long(end): 669 self._end = ExactPosition(end) 670 else: 671 raise TypeError("end=%r %s" % (end, type(end))) 672 self.strand = strand 673 self.ref = ref 674 self.ref_db = ref_db
675
676 - def _get_strand(self):
677 """Get function for the strand property (PRIVATE).""" 678 return self._strand
679
680 - def _set_strand(self, value):
681 """Set function for the strand property (PRIVATE).""" 682 if value not in [+1, -1, 0, None]: 683 raise ValueError("Strand should be +1, -1, 0 or None, not %r" 684 % value) 685 self._strand = value
686 687 strand = property(fget=_get_strand, fset=_set_strand, 688 doc="Strand of the location (+1, -1, 0 or None).") 689
690 - def __str__(self):
691 """Return a representation of the FeatureLocation object (with python counting). 692 693 For the simple case this uses the python splicing syntax, [122:150] 694 (zero based counting) which GenBank would call 123..150 (one based 695 counting). 696 """ 697 answer = "[%s:%s]" % (self._start, self._end) 698 if self.ref and self.ref_db: 699 answer = "%s:%s%s" % (self.ref_db, self.ref, answer) 700 elif self.ref: 701 answer = self.ref + answer 702 # Is ref_db without ref meaningful? 703 if self.strand is None: 704 return answer 705 elif self.strand == +1: 706 return answer + "(+)" 707 elif self.strand == -1: 708 return answer + "(-)" 709 else: 710 # strand = 0, stranded but strand unknown, ? in GFF3 711 return answer + "(?)"
712
713 - def __repr__(self):
714 """Represent the FeatureLocation object as a string for debugging.""" 715 optional = "" 716 if self.strand is not None: 717 optional += ", strand=%r" % self.strand 718 if self.ref is not None: 719 optional += ", ref=%r" % self.ref 720 if self.ref_db is not None: 721 optional += ", ref_db=%r" % self.ref_db 722 return "%s(%r, %r%s)" \ 723 % (self.__class__.__name__, self.start, self.end, optional)
724
725 - def __add__(self, other):
726 """Combine location with another FeatureLocation object, or shift it. 727 728 You can add two feature locations to make a join CompoundLocation: 729 730 >>> from Bio.SeqFeature import FeatureLocation 731 >>> f1 = FeatureLocation(5, 10) 732 >>> f2 = FeatureLocation(20, 30) 733 >>> combined = f1 + f2 734 >>> print(combined) 735 join{[5:10], [20:30]} 736 737 This is thus equivalent to: 738 739 >>> from Bio.SeqFeature import CompoundLocation 740 >>> join = CompoundLocation([f1, f2]) 741 >>> print(join) 742 join{[5:10], [20:30]} 743 744 You can also use sum(...) in this way: 745 746 >>> join = sum([f1, f2]) 747 >>> print(join) 748 join{[5:10], [20:30]} 749 750 Furthermore, you can combine a FeatureLocation with a CompoundLocation 751 in this way. 752 753 Separately, adding an integer will give a new FeatureLocation with 754 its start and end offset by that amount. For example: 755 756 >>> print(f1) 757 [5:10] 758 >>> print(f1 + 100) 759 [105:110] 760 >>> print(200 + f1) 761 [205:210] 762 763 This can be useful when editing annotation. 764 """ 765 if isinstance(other, FeatureLocation): 766 return CompoundLocation([self, other]) 767 elif _is_int_or_long(other): 768 return self._shift(other) 769 else: 770 # This will allow CompoundLocation's __radd__ to be called: 771 return NotImplemented
772
773 - def __radd__(self, other):
774 """Add a feature locationanother FeatureLocation object to the left.""" 775 if _is_int_or_long(other): 776 return self._shift(other) 777 else: 778 return NotImplemented
779
780 - def __nonzero__(self):
781 """Return True regardless of the length of the feature. 782 783 This behaviour is for backwards compatibility, since until the 784 __len__ method was added, a FeatureLocation always evaluated as True. 785 786 Note that in comparison, Seq objects, strings, lists, etc, will all 787 evaluate to False if they have length zero. 788 789 WARNING: The FeatureLocation may in future evaluate to False when its 790 length is zero (in order to better match normal python behaviour)! 791 """ 792 return True
793
794 - def __len__(self):
795 """Return the length of the region described by the FeatureLocation object. 796 797 Note that extra care may be needed for fuzzy locations, e.g. 798 799 >>> from Bio.SeqFeature import FeatureLocation 800 >>> from Bio.SeqFeature import BeforePosition, AfterPosition 801 >>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10)) 802 >>> len(loc) 803 5 804 """ 805 return int(self._end) - int(self._start)
806
807 - def __contains__(self, value):
808 """Check if an integer position is within the FeatureLocation object. 809 810 Note that extra care may be needed for fuzzy locations, e.g. 811 812 >>> from Bio.SeqFeature import FeatureLocation 813 >>> from Bio.SeqFeature import BeforePosition, AfterPosition 814 >>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10)) 815 >>> len(loc) 816 5 817 >>> [i for i in range(15) if i in loc] 818 [5, 6, 7, 8, 9] 819 """ 820 if not _is_int_or_long(value): 821 raise ValueError("Currently we only support checking for integer " 822 "positions being within a FeatureLocation.") 823 if value < self._start or value >= self._end: 824 return False 825 else: 826 return True
827
828 - def __iter__(self):
829 """Iterate over the parent positions within the FeatureLocation object. 830 831 >>> from Bio.SeqFeature import FeatureLocation 832 >>> from Bio.SeqFeature import BeforePosition, AfterPosition 833 >>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10)) 834 >>> len(loc) 835 5 836 >>> for i in loc: print(i) 837 5 838 6 839 7 840 8 841 9 842 >>> list(loc) 843 [5, 6, 7, 8, 9] 844 >>> [i for i in range(15) if i in loc] 845 [5, 6, 7, 8, 9] 846 847 Note this is strand aware: 848 849 >>> loc = FeatureLocation(BeforePosition(5), AfterPosition(10), strand = -1) 850 >>> list(loc) 851 [9, 8, 7, 6, 5] 852 """ 853 if self.strand == -1: 854 for i in range(self._end - 1, self._start - 1, -1): 855 yield i 856 else: 857 for i in range(self._start, self._end): 858 yield i
859
860 - def __eq__(self, other):
861 """Implement equality by comparing all the location attributes.""" 862 if not isinstance(other, FeatureLocation): 863 return False 864 return self._start == other.start and \ 865 self._end == other.end and \ 866 self._strand == other.strand and \ 867 self.ref == other.ref and \ 868 self.ref_db == other.ref_db
869
870 - def __ne__(self, other):
871 """Implement the not-equal operand.""" 872 # This is needed for py2, but not for py3. 873 return not self == other
874
875 - def _shift(self, offset):
876 """Return a copy of the FeatureLocation shifted by an offset (PRIVATE).""" 877 # TODO - What if offset is a fuzzy position? 878 if self.ref or self.ref_db: 879 # TODO - Return self? 880 raise ValueError("Feature references another sequence.") 881 return FeatureLocation(start=self._start._shift(offset), 882 end=self._end._shift(offset), 883 strand=self.strand)
884
885 - def _flip(self, length):
886 """Return a copy of the location after the parent is reversed (PRIVATE).""" 887 if self.ref or self.ref_db: 888 # TODO - Return self? 889 raise ValueError("Feature references another sequence.") 890 # Note this will flip the start and end too! 891 if self.strand == +1: 892 flip_strand = -1 893 elif self.strand == -1: 894 flip_strand = +1 895 else: 896 # 0 or None 897 flip_strand = self.strand 898 return FeatureLocation(start=self._end._flip(length), 899 end=self._start._flip(length), 900 strand=flip_strand)
901 902 @property
903 - def parts(self):
904 """Read only list of sections (always one, the FeatureLocation object). 905 906 This is a convenience property allowing you to write code handling 907 both simple FeatureLocation objects (with one part) and more complex 908 CompoundLocation objects (with multiple parts) interchangeably. 909 """ 910 return [self]
911 912 @property
913 - def start(self):
914 """Start location - left most (minimum) value, regardless of strand. 915 916 Read only, returns an integer like position object, possibly a fuzzy 917 position. 918 """ 919 return self._start
920 921 @property
922 - def end(self):
923 """End location - right most (maximum) value, regardless of strand. 924 925 Read only, returns an integer like position object, possibly a fuzzy 926 position. 927 """ 928 return self._end
929 930 @property
931 - def nofuzzy_start(self):
932 """Start position (integer, approximated if fuzzy, read only) (OBSOLETE). 933 934 This is now an alias for int(feature.start), which should be 935 used in preference -- unless you are trying to support old 936 versions of Biopython. 937 """ 938 try: 939 return int(self._start) 940 except TypeError: 941 if isinstance(self._start, UnknownPosition): 942 return None 943 raise
944 945 @property
946 - def nofuzzy_end(self):
947 """End position (integer, approximated if fuzzy, read only) (OBSOLETE). 948 949 This is now an alias for int(feature.end), which should be 950 used in preference -- unless you are trying to support old 951 versions of Biopython. 952 """ 953 try: 954 return int(self._end) 955 except TypeError: 956 if isinstance(self._end, UnknownPosition): 957 return None 958 raise
959
960 - def extract(self, parent_sequence):
961 """Extract the sequence from supplied parent sequence using the FeatureLocation object. 962 963 The parent_sequence can be a Seq like object or a string, and will 964 generally return an object of the same type. The exception to this is 965 a MutableSeq as the parent sequence will return a Seq object. 966 967 >>> from Bio.Seq import Seq 968 >>> from Bio.Alphabet import generic_protein 969 >>> from Bio.SeqFeature import FeatureLocation 970 >>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein) 971 >>> feature_loc = FeatureLocation(8, 15) 972 >>> feature_loc.extract(seq) 973 Seq('VALIVIC', ProteinAlphabet()) 974 975 """ 976 if self.ref or self.ref_db: 977 # TODO - Take a dictionary as an optional argument? 978 raise ValueError("Feature references another sequence.") 979 if isinstance(parent_sequence, MutableSeq): 980 # This avoids complications with reverse complements 981 # (the MutableSeq reverse complement acts in situ) 982 parent_sequence = parent_sequence.toseq() 983 f_seq = parent_sequence[self.nofuzzy_start:self.nofuzzy_end] 984 if self.strand == -1: 985 try: 986 f_seq = f_seq.reverse_complement() 987 except AttributeError: 988 assert isinstance(f_seq, str) 989 f_seq = reverse_complement(f_seq) 990 return f_seq
991
992 993 -class CompoundLocation(object):
994 """For handling joins etc where a feature location has several parts.""" 995
996 - def __init__(self, parts, operator="join"):
997 """Initialize the class. 998 999 >>> from Bio.SeqFeature import FeatureLocation, CompoundLocation 1000 >>> f1 = FeatureLocation(10, 40, strand=+1) 1001 >>> f2 = FeatureLocation(50, 59, strand=+1) 1002 >>> f = CompoundLocation([f1, f2]) 1003 >>> len(f) == len(f1) + len(f2) == 39 == len(list(f)) 1004 True 1005 >>> print(f.operator) 1006 join 1007 >>> 5 in f 1008 False 1009 >>> 15 in f 1010 True 1011 >>> f.strand 1012 1 1013 1014 Notice that the strand of the compound location is computed 1015 automatically - in the case of mixed strands on the sub-locations 1016 the overall strand is set to None. 1017 1018 >>> f = CompoundLocation([FeatureLocation(3, 6, strand=+1), 1019 ... FeatureLocation(10, 13, strand=-1)]) 1020 >>> print(f.strand) 1021 None 1022 >>> len(f) 1023 6 1024 >>> list(f) 1025 [3, 4, 5, 12, 11, 10] 1026 1027 The example above doing list(f) iterates over the coordinates within the 1028 feature. This allows you to use max and min on the location, to find the 1029 range covered: 1030 1031 >>> min(f) 1032 3 1033 >>> max(f) 1034 12 1035 1036 More generally, you can use the compound location's start and end which 1037 give the full range covered, 0 <= start <= end <= full sequence length. 1038 1039 >>> f.start == min(f) 1040 True 1041 >>> f.end == max(f) + 1 1042 True 1043 1044 This is consistent with the behaviour of the simple FeatureLocation for 1045 a single region, where again the 'start' and 'end' do not necessarily 1046 give the biological start and end, but rather the 'minimal' and 'maximal' 1047 coordinate boundaries. 1048 1049 Note that adding locations provides a more intuitive method of 1050 construction: 1051 1052 >>> f = FeatureLocation(3, 6, strand=+1) + FeatureLocation(10, 13, strand=-1) 1053 >>> len(f) 1054 6 1055 >>> list(f) 1056 [3, 4, 5, 12, 11, 10] 1057 """ 1058 self.operator = operator 1059 self.parts = list(parts) 1060 for loc in self.parts: 1061 if not isinstance(loc, FeatureLocation): 1062 raise ValueError("CompoundLocation should be given a list of " 1063 "FeatureLocation objects, not %s" % loc.__class__) 1064 if len(parts) < 2: 1065 raise ValueError( 1066 "CompoundLocation should have at least 2 parts, not %r" % parts)
1067
1068 - def __str__(self):
1069 """Return a representation of the CompoundLocation object (with python counting).""" 1070 return "%s{%s}" % (self.operator, ", ".join(str(loc) for loc in self.parts))
1071
1072 - def __repr__(self):
1073 """Represent the CompoundLocation object as string for debugging.""" 1074 return "%s(%r, %r)" % (self.__class__.__name__, 1075 self.parts, self.operator)
1076
1077 - def _get_strand(self):
1078 """Get function for the strand property (PRIVATE).""" 1079 # Historically a join on the reverse strand has been represented 1080 # in Biopython with both the parent SeqFeature and its children 1081 # (the exons for a CDS) all given a strand of -1. Likewise, for 1082 # a join feature on the forward strand they all have strand +1. 1083 # However, we must also consider evil mixed strand examples like 1084 # this, join(complement(69611..69724),139856..140087,140625..140650) 1085 if len(set(loc.strand for loc in self.parts)) == 1: 1086 return self.parts[0].strand 1087 else: 1088 return None # i.e. mixed strands
1089
1090 - def _set_strand(self, value):
1091 """Set function for the strand property (PRIVATE).""" 1092 # Should this be allowed/encouraged? 1093 for loc in self.parts: 1094 loc.strand = value
1095 strand = property(fget=_get_strand, fset=_set_strand, 1096 doc="""Overall strand of the compound location. 1097 1098 If all the parts have the same strand, that is returned. Otherwise 1099 for mixed strands, this returns None. 1100 1101 >>> from Bio.SeqFeature import FeatureLocation, CompoundLocation 1102 >>> f1 = FeatureLocation(15, 17, strand=1) 1103 >>> f2 = FeatureLocation(20, 30, strand=-1) 1104 >>> f = f1 + f2 1105 >>> f1.strand 1106 1 1107 >>> f2.strand 1108 -1 1109 >>> f.strand 1110 >>> f.strand is None 1111 True 1112 1113 If you set the strand of a CompoundLocation, this is applied to 1114 all the parts - use with caution: 1115 1116 >>> f.strand = 1 1117 >>> f1.strand 1118 1 1119 >>> f2.strand 1120 1 1121 >>> f.strand 1122 1 1123 1124 """) 1125
1126 - def __add__(self, other):
1127 """Combine locations, or shift the location by an integer offset. 1128 1129 >>> from Bio.SeqFeature import FeatureLocation, CompoundLocation 1130 >>> f1 = FeatureLocation(15, 17) + FeatureLocation(20, 30) 1131 >>> print(f1) 1132 join{[15:17], [20:30]} 1133 1134 You can add another FeatureLocation: 1135 1136 >>> print(f1 + FeatureLocation(40, 50)) 1137 join{[15:17], [20:30], [40:50]} 1138 >>> print(FeatureLocation(5, 10) + f1) 1139 join{[5:10], [15:17], [20:30]} 1140 1141 You can also add another CompoundLocation: 1142 1143 >>> f2 = FeatureLocation(40, 50) + FeatureLocation(60, 70) 1144 >>> print(f2) 1145 join{[40:50], [60:70]} 1146 >>> print(f1 + f2) 1147 join{[15:17], [20:30], [40:50], [60:70]} 1148 1149 Also, as with the FeatureLocation, adding an integer shifts the 1150 location's co-ordinates by that offset: 1151 1152 >>> print(f1 + 100) 1153 join{[115:117], [120:130]} 1154 >>> print(200 + f1) 1155 join{[215:217], [220:230]} 1156 >>> print(f1 + (-5)) 1157 join{[10:12], [15:25]} 1158 """ 1159 if isinstance(other, FeatureLocation): 1160 return CompoundLocation(self.parts + [other], self.operator) 1161 elif isinstance(other, CompoundLocation): 1162 if self.operator != other.operator: 1163 # Handle join+order -> order as a special case? 1164 raise ValueError("Mixed operators %s and %s" 1165 % (self.operator, other.operator)) 1166 return CompoundLocation(self.parts + other.parts, self.operator) 1167 elif _is_int_or_long(other): 1168 return self._shift(other) 1169 else: 1170 raise NotImplementedError
1171
1172 - def __radd__(self, other):
1173 """Add a feature to the left.""" 1174 if isinstance(other, FeatureLocation): 1175 return CompoundLocation([other] + self.parts, self.operator) 1176 elif _is_int_or_long(other): 1177 return self._shift(other) 1178 else: 1179 raise NotImplementedError
1180
1181 - def __contains__(self, value):
1182 """Check if an integer position is within the CompoundLocation object.""" 1183 for loc in self.parts: 1184 if value in loc: 1185 return True 1186 return False
1187
1188 - def __nonzero__(self):
1189 """Return True regardless of the length of the feature. 1190 1191 This behaviour is for backwards compatibility, since until the 1192 __len__ method was added, a FeatureLocation always evaluated as True. 1193 1194 Note that in comparison, Seq objects, strings, lists, etc, will all 1195 evaluate to False if they have length zero. 1196 1197 WARNING: The FeatureLocation may in future evaluate to False when its 1198 length is zero (in order to better match normal python behaviour)! 1199 """ 1200 return True
1201
1202 - def __len__(self):
1203 """Return the length of the CompoundLocation object.""" 1204 return sum(len(loc) for loc in self.parts)
1205
1206 - def __iter__(self):
1207 """Iterate over the parent positions within the CompoundLocation object.""" 1208 for loc in self.parts: 1209 for pos in loc: 1210 yield pos
1211
1212 - def __eq__(self, other):
1213 """Check if all parts of CompoundLocation are equal to all parts of other CompoundLocation.""" 1214 if not isinstance(other, CompoundLocation): 1215 return False 1216 if len(self.parts) != len(other.parts): 1217 return False 1218 if self.operator != other.operator: 1219 return False 1220 for self_part, other_part in zip(self.parts, other.parts): 1221 if self_part != other_part: 1222 return False 1223 return True
1224
1225 - def __ne__(self, other):
1226 """Implement the not-equal operand.""" 1227 # This is needed for py2, but not for py3. 1228 return not self == other
1229
1230 - def _shift(self, offset):
1231 """Return a copy of the CompoundLocation shifted by an offset (PRIVATE).""" 1232 return CompoundLocation([loc._shift(offset) for loc in self.parts], 1233 self.operator)
1234
1235 - def _flip(self, length):
1236 """Return a copy of the locations after the parent is reversed (PRIVATE). 1237 1238 Note that the order of the parts is NOT reversed too. Consider a CDS 1239 on the forward strand with exons small, medium and large (in length). 1240 Once we change the frame of reference to the reverse complement strand, 1241 the start codon is still part of the small exon, and the stop codon 1242 still part of the large exon - so the part order remains the same! 1243 1244 Here is an artificial example, were the features map to the two upper 1245 case regions and the lower case runs of n are not used: 1246 1247 >>> from Bio.Seq import Seq 1248 >>> from Bio.SeqFeature import FeatureLocation 1249 >>> dna = Seq("nnnnnAGCATCCTGCTGTACnnnnnnnnGAGAMTGCCATGCCCCTGGAGTGAnnnnn") 1250 >>> small = FeatureLocation(5, 20, strand=1) 1251 >>> large = FeatureLocation(28, 52, strand=1) 1252 >>> location = small + large 1253 >>> print(small) 1254 [5:20](+) 1255 >>> print(large) 1256 [28:52](+) 1257 >>> print(location) 1258 join{[5:20](+), [28:52](+)} 1259 >>> for part in location.parts: 1260 ... print(len(part)) 1261 ... 1262 15 1263 24 1264 1265 As you can see, this is a silly example where each "exon" is a word: 1266 1267 >>> print(small.extract(dna).translate()) 1268 SILLY 1269 >>> print(large.extract(dna).translate()) 1270 EXAMPLE* 1271 >>> print(location.extract(dna).translate()) 1272 SILLYEXAMPLE* 1273 >>> for part in location.parts: 1274 ... print(part.extract(dna).translate()) 1275 ... 1276 SILLY 1277 EXAMPLE* 1278 1279 Now, let's look at this from the reverse strand frame of reference: 1280 1281 >>> flipped_dna = dna.reverse_complement() 1282 >>> flipped_location = location._flip(len(dna)) 1283 >>> print(flipped_location.extract(flipped_dna).translate()) 1284 SILLYEXAMPLE* 1285 >>> for part in flipped_location.parts: 1286 ... print(part.extract(flipped_dna).translate()) 1287 ... 1288 SILLY 1289 EXAMPLE* 1290 1291 The key point here is the first part of the CompoundFeature is still the 1292 small exon, while the second part is still the large exon: 1293 1294 >>> for part in flipped_location.parts: 1295 ... print(len(part)) 1296 ... 1297 15 1298 24 1299 >>> print(flipped_location) 1300 join{[37:52](-), [5:29](-)} 1301 1302 Notice the parts are not reversed. However, there was a bug here in older 1303 versions of Biopython which would have given join{[5:29](-), [37:52](-)} 1304 and the translation would have wrongly been "EXAMPLE*SILLY" instead. 1305 1306 """ 1307 return CompoundLocation([loc._flip(length) for loc in self.parts], 1308 self.operator)
1309 1310 @property
1311 - def start(self):
1312 """Start location - left most (minimum) value, regardless of strand. 1313 1314 Read only, returns an integer like position object, possibly a fuzzy 1315 position. 1316 1317 For the special case of a CompoundLocation wrapping the origin of a 1318 circular genome, this will return zero. 1319 """ 1320 return min(loc.start for loc in self.parts)
1321 1322 @property
1323 - def end(self):
1324 """End location - right most (maximum) value, regardless of strand. 1325 1326 Read only, returns an integer like position object, possibly a fuzzy 1327 position. 1328 1329 For the special case of a CompoundLocation wrapping the origin of 1330 a circular genome this will match the genome length (minus one 1331 given how Python counts from zero). 1332 """ 1333 return max(loc.end for loc in self.parts)
1334 1335 @property
1336 - def nofuzzy_start(self):
1337 """Start position (integer, approximated if fuzzy, read only) (OBSOLETE). 1338 1339 This is an alias for int(feature.start), which should be used in 1340 preference -- unless you are trying to support old versions of 1341 Biopython. 1342 """ 1343 try: 1344 return int(self.start) 1345 except TypeError: 1346 if isinstance(self.start, UnknownPosition): 1347 return None 1348 raise
1349 1350 @property
1351 - def nofuzzy_end(self):
1352 """End position (integer, approximated if fuzzy, read only) (OBSOLETE). 1353 1354 This is an alias for int(feature.end), which should be used in 1355 preference -- unless you are trying to support old versions of 1356 Biopython. 1357 """ 1358 try: 1359 return int(self.end) 1360 except TypeError: 1361 if isinstance(self.end, UnknownPosition): 1362 return None 1363 raise
1364 1365 @property
1366 - def ref(self):
1367 """Not present in CompoundLocation, dummy method for API compatibility.""" 1368 return None
1369 1370 @property
1371 - def ref_db(self):
1372 """Not present in CompoundLocation, dummy method for API compatibility.""" 1373 return None
1374
1375 - def extract(self, parent_sequence):
1376 """Extract the sequence from supplied parent sequence using the CompoundLocation object. 1377 1378 The parent_sequence can be a Seq like object or a string, and will 1379 generally return an object of the same type. The exception to this is 1380 a MutableSeq as the parent sequence will return a Seq object. 1381 1382 >>> from Bio.Seq import Seq 1383 >>> from Bio.Alphabet import generic_protein 1384 >>> from Bio.SeqFeature import FeatureLocation, CompoundLocation 1385 >>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein) 1386 >>> fl1 = FeatureLocation(2, 8) 1387 >>> fl2 = FeatureLocation(10, 15) 1388 >>> fl3 = CompoundLocation([fl1,fl2]) 1389 >>> fl3.extract(seq) 1390 Seq('QHKAMILIVIC', ProteinAlphabet()) 1391 1392 """ 1393 # This copes with mixed strand features & all on reverse: 1394 parts = [loc.extract(parent_sequence) for loc in self.parts] 1395 # We use addition rather than a join to avoid alphabet issues: 1396 f_seq = parts[0] 1397 for part in parts[1:]: 1398 f_seq += part 1399 return f_seq
1400
1401 1402 -class AbstractPosition(object):
1403 """Abstract base class representing a position.""" 1404
1405 - def __repr__(self):
1406 """Represent the AbstractPosition object as a string for debugging.""" 1407 return "%s(...)" % (self.__class__.__name__)
1408
1409 1410 -class ExactPosition(int, AbstractPosition):
1411 """Specify the specific position of a boundary. 1412 1413 Arguments: 1414 - position - The position of the boundary. 1415 - extension - An optional argument which must be zero since we don't 1416 have an extension. The argument is provided so that the same number 1417 of arguments can be passed to all position types. 1418 1419 In this case, there is no fuzziness associated with the position. 1420 1421 >>> p = ExactPosition(5) 1422 >>> p 1423 ExactPosition(5) 1424 >>> print(p) 1425 5 1426 1427 >>> isinstance(p, AbstractPosition) 1428 True 1429 >>> isinstance(p, int) 1430 True 1431 1432 Integer comparisons and operations should work as expected: 1433 1434 >>> p == 5 1435 True 1436 >>> p < 6 1437 True 1438 >>> p <= 5 1439 True 1440 >>> p + 10 1441 15 1442 1443 """ 1444
1445 - def __new__(cls, position, extension=0):
1446 """Create an ExactPosition object.""" 1447 if extension != 0: 1448 raise AttributeError("Non-zero extension %s for exact position." 1449 % extension) 1450 return int.__new__(cls, position)
1451
1452 - def __repr__(self):
1453 """Represent the ExactPosition object as a string for debugging.""" 1454 return "%s(%i)" % (self.__class__.__name__, int(self))
1455 1456 @property
1457 - def position(self):
1458 """Legacy attribute to get position as integer (OBSOLETE).""" 1459 return int(self)
1460 1461 @property
1462 - def extension(self):
1463 """Not present in this object, return zero (OBSOLETE).""" 1464 return 0
1465
1466 - def _shift(self, offset):
1467 """Return a copy of the position object with its location shifted (PRIVATE).""" 1468 # By default preserve any subclass 1469 return self.__class__(int(self) + offset)
1470
1471 - def _flip(self, length):
1472 """Return a copy of the location after the parent is reversed (PRIVATE).""" 1473 # By default perserve any subclass 1474 return self.__class__(length - int(self))
1475
1476 1477 -class UncertainPosition(ExactPosition):
1478 """Specify a specific position which is uncertain. 1479 1480 This is used in UniProt, e.g. ?222 for uncertain position 222, or in the 1481 XML format explicitly marked as uncertain. Does not apply to GenBank/EMBL. 1482 """ 1483 1484 pass
1485
1486 1487 -class UnknownPosition(AbstractPosition):
1488 """Specify a specific position which is unknown (has no position). 1489 1490 This is used in UniProt, e.g. ? or in the XML as unknown. 1491 """ 1492
1493 - def __repr__(self):
1494 """Represent the UnknownPosition object as a string for debugging.""" 1495 return "%s()" % self.__class__.__name__
1496
1497 - def __hash__(self):
1498 """Return the hash value of the UnknownPosition object.""" 1499 return hash(None)
1500 1501 @property
1502 - def position(self):
1503 """Legacy attribute to get location (None) (OBSOLETE).""" 1504 return None
1505 1506 @property
1507 - def extension(self):
1508 """Legacy attribute to get extension (zero) as integer (OBSOLETE).""" 1509 return 0
1510
1511 - def _shift(self, offset):
1512 """Return a copy of the position object with its location shifted (PRIVATE).""" 1513 return self
1514
1515 - def _flip(self, length):
1516 """Return a copy of the location after the parent is reversed (PRIVATE).""" 1517 return self
1518
1519 1520 -class WithinPosition(int, AbstractPosition):
1521 """Specify the position of a boundary within some coordinates. 1522 1523 Arguments: 1524 - position - The default integer position 1525 - left - The start (left) position of the boundary 1526 - right - The end (right) position of the boundary 1527 1528 This allows dealing with a position like ((1.4)..100). This 1529 indicates that the start of the sequence is somewhere between 1 1530 and 4. Since this is a start coordinate, it should acts like 1531 it is at position 1 (or in Python counting, 0). 1532 1533 >>> p = WithinPosition(10, 10, 13) 1534 >>> p 1535 WithinPosition(10, left=10, right=13) 1536 >>> print(p) 1537 (10.13) 1538 >>> int(p) 1539 10 1540 1541 Basic integer comparisons and operations should work as though 1542 this were a plain integer: 1543 1544 >>> p == 10 1545 True 1546 >>> p in [9, 10, 11] 1547 True 1548 >>> p < 11 1549 True 1550 >>> p + 10 1551 20 1552 1553 >>> isinstance(p, WithinPosition) 1554 True 1555 >>> isinstance(p, AbstractPosition) 1556 True 1557 >>> isinstance(p, int) 1558 True 1559 1560 Note this also applies for comparison to other position objects, 1561 where again the integer behaviour is used: 1562 1563 >>> p == 10 1564 True 1565 >>> p == ExactPosition(10) 1566 True 1567 >>> p == BeforePosition(10) 1568 True 1569 >>> p == AfterPosition(10) 1570 True 1571 1572 If this were an end point, you would want the position to be 13: 1573 1574 >>> p2 = WithinPosition(13, 10, 13) 1575 >>> p2 1576 WithinPosition(13, left=10, right=13) 1577 >>> print(p2) 1578 (10.13) 1579 >>> int(p2) 1580 13 1581 >>> p2 == 13 1582 True 1583 >>> p2 == ExactPosition(13) 1584 True 1585 1586 The old legacy properties of position and extension give the 1587 starting/lower/left position as an integer, and the distance 1588 to the ending/higher/right position as an integer. Note that 1589 the position object will act like either the left or the right 1590 end-point depending on how it was created: 1591 1592 >>> p.position == p2.position == 10 1593 True 1594 >>> p.extension == p2.extension == 3 1595 True 1596 >>> int(p) == int(p2) 1597 False 1598 >>> p == 10 1599 True 1600 >>> p2 == 13 1601 True 1602 1603 """ 1604
1605 - def __new__(cls, position, left, right):
1606 """Create a WithinPosition object.""" 1607 assert position == left or position == right, \ 1608 "WithinPosition: %r should match left %r or right %r" \ 1609 % (position, left, right) 1610 obj = int.__new__(cls, position) 1611 obj._left = left 1612 obj._right = right 1613 return obj
1614
1615 - def __repr__(self):
1616 """Represent the WithinPosition object as a string for debugging.""" 1617 return "%s(%i, left=%i, right=%i)" \ 1618 % (self.__class__.__name__, int(self), 1619 self._left, self._right)
1620
1621 - def __str__(self):
1622 """Return a representation of the WithinPosition object (with python counting).""" 1623 return "(%s.%s)" % (self._left, self._right)
1624 1625 @property
1626 - def position(self):
1627 """Legacy attribute to get (left) position as integer (OBSOLETE).""" 1628 return self._left
1629 1630 @property
1631 - def extension(self):
1632 """Legacy attribute to get extension (from left to right) as an integer (OBSOLETE).""" 1633 return self._right - self._left
1634
1635 - def _shift(self, offset):
1636 """Return a copy of the position object with its location shifted (PRIVATE).""" 1637 return self.__class__(int(self) + offset, 1638 self._left + offset, 1639 self._right + offset)
1640
1641 - def _flip(self, length):
1642 """Return a copy of the location after the parent is reversed (PRIVATE).""" 1643 return self.__class__(length - int(self), 1644 length - self._right, 1645 length - self._left)
1646
1647 1648 -class BetweenPosition(int, AbstractPosition):
1649 """Specify the position of a boundary between two coordinates (OBSOLETE?). 1650 1651 Arguments: 1652 - position - The default integer position 1653 - left - The start (left) position of the boundary 1654 - right - The end (right) position of the boundary 1655 1656 This allows dealing with a position like 123^456. This 1657 indicates that the start of the sequence is somewhere between 1658 123 and 456. It is up to the parser to set the position argument 1659 to either boundary point (depending on if this is being used as 1660 a start or end of the feature). For example as a feature end: 1661 1662 >>> p = BetweenPosition(456, 123, 456) 1663 >>> p 1664 BetweenPosition(456, left=123, right=456) 1665 >>> print(p) 1666 (123^456) 1667 >>> int(p) 1668 456 1669 1670 Integer equality and comparison use the given position, 1671 1672 >>> p == 456 1673 True 1674 >>> p in [455, 456, 457] 1675 True 1676 >>> p > 300 1677 True 1678 1679 The old legacy properties of position and extension give the 1680 starting/lower/left position as an integer, and the distance 1681 to the ending/higher/right position as an integer. Note that 1682 the position object will act like either the left or the right 1683 end-point depending on how it was created: 1684 1685 >>> p2 = BetweenPosition(123, left=123, right=456) 1686 >>> p.position == p2.position == 123 1687 True 1688 >>> p.extension 1689 333 1690 >>> p2.extension 1691 333 1692 >>> p.extension == p2.extension == 333 1693 True 1694 >>> int(p) == int(p2) 1695 False 1696 >>> p == 456 1697 True 1698 >>> p2 == 123 1699 True 1700 1701 Note this potentially surprising behaviour: 1702 1703 >>> BetweenPosition(123, left=123, right=456) == ExactPosition(123) 1704 True 1705 >>> BetweenPosition(123, left=123, right=456) == BeforePosition(123) 1706 True 1707 >>> BetweenPosition(123, left=123, right=456) == AfterPosition(123) 1708 True 1709 1710 i.e. For equality (and sorting) the position objects behave like 1711 integers. 1712 """ 1713
1714 - def __new__(cls, position, left, right):
1715 """Create a new instance in BetweenPosition object.""" 1716 assert position == left or position == right 1717 obj = int.__new__(cls, position) 1718 obj._left = left 1719 obj._right = right 1720 return obj
1721
1722 - def __repr__(self):
1723 """Represent the BetweenPosition object as a string for debugging.""" 1724 return "%s(%i, left=%i, right=%i)" \ 1725 % (self.__class__.__name__, int(self), 1726 self._left, self._right)
1727
1728 - def __str__(self):
1729 """Return a representation of the BetweenPosition object (with python counting).""" 1730 return "(%s^%s)" % (self._left, self._right)
1731 1732 @property
1733 - def position(self):
1734 """Legacy attribute to get (left) position as integer (OBSOLETE).""" 1735 return self._left
1736 1737 @property
1738 - def extension(self):
1739 """Legacy attribute to get extension (from left to right) as an integer (OBSOLETE).""" 1740 return self._right - self._left
1741
1742 - def _shift(self, offset):
1743 """Return a copy of the position object with its location shifted (PRIVATE).""" 1744 return self.__class__(int(self) + offset, 1745 self._left + offset, 1746 self._right + offset)
1747
1748 - def _flip(self, length):
1749 """Return a copy of the location after the parent is reversed (PRIVATE).""" 1750 return self.__class__(length - int(self), 1751 length - self._right, 1752 length - self._left)
1753
1754 1755 -class BeforePosition(int, AbstractPosition):
1756 """Specify a position where the actual location occurs before it. 1757 1758 Arguments: 1759 - position - The upper boundary of where the location can occur. 1760 - extension - An optional argument which must be zero since we don't 1761 have an extension. The argument is provided so that the same number 1762 of arguments can be passed to all position types. 1763 1764 This is used to specify positions like (<10..100) where the location 1765 occurs somewhere before position 10. 1766 1767 >>> p = BeforePosition(5) 1768 >>> p 1769 BeforePosition(5) 1770 >>> print(p) 1771 <5 1772 >>> int(p) 1773 5 1774 >>> p + 10 1775 15 1776 1777 Note this potentially surprising behaviour: 1778 1779 >>> p == ExactPosition(5) 1780 True 1781 >>> p == AfterPosition(5) 1782 True 1783 1784 Just remember that for equality and sorting the position objects act 1785 like integers. 1786 """ 1787 1788 # Subclasses int so can't use __init__
1789 - def __new__(cls, position, extension=0):
1790 """Create a new instance in BeforePosition object.""" 1791 if extension != 0: 1792 raise AttributeError("Non-zero extension %s for exact position." 1793 % extension) 1794 return int.__new__(cls, position)
1795 1796 @property
1797 - def position(self):
1798 """Legacy attribute to get position as integer (OBSOLETE).""" 1799 return int(self)
1800 1801 @property
1802 - def extension(self):
1803 """Legacy attribute to get extension (zero) as integer (OBSOLETE).""" 1804 return 0
1805
1806 - def __repr__(self):
1807 """Represent the location as a string for debugging.""" 1808 return "%s(%i)" % (self.__class__.__name__, int(self))
1809
1810 - def __str__(self):
1811 """Return a representation of the BeforePosition object (with python counting).""" 1812 return "<%s" % self.position
1813
1814 - def _shift(self, offset):
1815 """Return a copy of the position object with its location shifted (PRIVATE).""" 1816 return self.__class__(int(self) + offset)
1817
1818 - def _flip(self, length):
1819 """Return a copy of the location after the parent is reversed (PRIVATE).""" 1820 return AfterPosition(length - int(self))
1821
1822 1823 -class AfterPosition(int, AbstractPosition):
1824 """Specify a position where the actual location is found after it. 1825 1826 Arguments: 1827 - position - The lower boundary of where the location can occur. 1828 - extension - An optional argument which must be zero since we don't 1829 have an extension. The argument is provided so that the same number 1830 of arguments can be passed to all position types. 1831 1832 This is used to specify positions like (>10..100) where the location 1833 occurs somewhere after position 10. 1834 1835 >>> p = AfterPosition(7) 1836 >>> p 1837 AfterPosition(7) 1838 >>> print(p) 1839 >7 1840 >>> int(p) 1841 7 1842 >>> p + 10 1843 17 1844 1845 >>> isinstance(p, AfterPosition) 1846 True 1847 >>> isinstance(p, AbstractPosition) 1848 True 1849 >>> isinstance(p, int) 1850 True 1851 1852 Note this potentially surprising behaviour: 1853 1854 >>> p == ExactPosition(7) 1855 True 1856 >>> p == BeforePosition(7) 1857 True 1858 1859 Just remember that for equality and sorting the position objects act 1860 like integers. 1861 """ 1862 1863 # Subclasses int so can't use __init__
1864 - def __new__(cls, position, extension=0):
1865 """Create a new instance of the AfterPosition object.""" 1866 if extension != 0: 1867 raise AttributeError("Non-zero extension %s for exact position." 1868 % extension) 1869 return int.__new__(cls, position)
1870 1871 @property
1872 - def position(self):
1873 """Legacy attribute to get position as integer (OBSOLETE).""" 1874 return int(self)
1875 1876 @property
1877 - def extension(self):
1878 """Legacy attribute to get extension (zero) as integer (OBSOLETE).""" 1879 return 0
1880
1881 - def __repr__(self):
1882 """Represent the location as a string for debugging.""" 1883 return "%s(%i)" % (self.__class__.__name__, int(self))
1884
1885 - def __str__(self):
1886 """Return a representation of the AfterPosition object (with python counting).""" 1887 return ">%s" % self.position
1888
1889 - def _shift(self, offset):
1890 """Return a copy of the position object with its location shifted (PRIVATE).""" 1891 return self.__class__(int(self) + offset)
1892
1893 - def _flip(self, length):
1894 """Return a copy of the location after the parent is reversed (PRIVATE).""" 1895 return BeforePosition(length - int(self))
1896
1897 1898 -class OneOfPosition(int, AbstractPosition):
1899 """Specify a position where the location can be multiple positions. 1900 1901 This models the GenBank 'one-of(1888,1901)' function, and tries 1902 to make this fit within the Biopython Position models. If this was 1903 a start position it should act like 1888, but as an end position 1901. 1904 1905 >>> p = OneOfPosition(1888, [ExactPosition(1888), ExactPosition(1901)]) 1906 >>> p 1907 OneOfPosition(1888, choices=[ExactPosition(1888), ExactPosition(1901)]) 1908 >>> int(p) 1909 1888 1910 1911 Interget comparisons and operators act like using int(p), 1912 1913 >>> p == 1888 1914 True 1915 >>> p <= 1888 1916 True 1917 >>> p > 1888 1918 False 1919 >>> p + 100 1920 1988 1921 1922 >>> isinstance(p, OneOfPosition) 1923 True 1924 >>> isinstance(p, AbstractPosition) 1925 True 1926 >>> isinstance(p, int) 1927 True 1928 1929 The old legacy properties of position and extension give the 1930 starting/lowest/left-most position as an integer, and the 1931 distance to the ending/highest/right-most position as an integer. 1932 Note that the position object will act like one of the list of 1933 possible locations depending on how it was created: 1934 1935 >>> p2 = OneOfPosition(1901, [ExactPosition(1888), ExactPosition(1901)]) 1936 >>> p.position == p2.position == 1888 1937 True 1938 >>> p.extension == p2.extension == 13 1939 True 1940 >>> int(p) == int(p2) 1941 False 1942 >>> p == 1888 1943 True 1944 >>> p2 == 1901 1945 True 1946 1947 """ 1948
1949 - def __new__(cls, position, choices):
1950 """Initialize with a set of possible positions. 1951 1952 position_list is a list of AbstractPosition derived objects, 1953 specifying possible locations. 1954 1955 position is an integer specifying the default behaviour. 1956 """ 1957 assert position in choices, \ 1958 "OneOfPosition: %r should match one of %r" % (position, choices) 1959 obj = int.__new__(cls, position) 1960 obj.position_choices = choices 1961 return obj
1962 1963 @property
1964 - def position(self):
1965 """Legacy attribute to get (left) position as integer (OBSOLETE).""" 1966 return min(int(pos) for pos in self.position_choices)
1967 1968 @property
1969 - def extension(self):
1970 """Legacy attribute to get extension as integer (OBSOLETE).""" 1971 positions = [int(pos) for pos in self.position_choices] 1972 return max(positions) - min(positions)
1973
1974 - def __repr__(self):
1975 """Represent the OneOfPosition object as a string for debugging.""" 1976 return "%s(%i, choices=%r)" % (self.__class__.__name__, 1977 int(self), self.position_choices)
1978
1979 - def __str__(self):
1980 """Return a representation of the OneOfPosition object (with python counting).""" 1981 out = "one-of(" 1982 for position in self.position_choices: 1983 out += "%s," % position 1984 # replace the last comma with the closing parenthesis 1985 out = out[:-1] + ")" 1986 return out
1987
1988 - def _shift(self, offset):
1989 """Return a copy of the position object with its location shifted (PRIVATE).""" 1990 return self.__class__(int(self) + offset, 1991 [p._shift(offset) for p in self.position_choices])
1992
1993 - def _flip(self, length):
1994 """Return a copy of the location after the parent is reversed (PRIVATE).""" 1995 return self.__class__(length - int(self), 1996 [p._flip(length) for p in self.position_choices[::-1]])
1997
1998 1999 -class PositionGap(object):
2000 """Simple class to hold information about a gap between positions.""" 2001
2002 - def __init__(self, gap_size):
2003 """Intialize with a position object containing the gap information.""" 2004 self.gap_size = gap_size
2005
2006 - def __repr__(self):
2007 """Represent the position gap as a string for debugging.""" 2008 return "%s(%s)" % (self.__class__.__name__, repr(self.gap_size))
2009
2010 - def __str__(self):
2011 """Return a representation of the PositionGap object (with python counting).""" 2012 out = "gap(%s)" % self.gap_size 2013 return out
2014 2015 2016 if __name__ == "__main__": 2017 from Bio._utils import run_doctest 2018 run_doctest() 2019