Package Bio :: Module Seq :: Class UnknownSeq
[hide private]
[frames] | no frames]

Class UnknownSeq

source code

object --+    
         |    
       Seq --+
             |
            UnknownSeq

A read-only sequence object of known length but unknown contents.

If you have an unknown sequence, you can represent this with a normal Seq object, for example:

>>> my_seq = Seq("N"*5)
>>> my_seq
Seq('NNNNN', Alphabet())
>>> len(my_seq)
5
>>> print(my_seq)
NNNNN

However, this is rather wasteful of memory (especially for large sequences), which is where this class is most usefull:

>>> unk_five = UnknownSeq(5)
>>> unk_five
UnknownSeq(5, alphabet = Alphabet(), character = '?')
>>> len(unk_five)
5
>>> print(unk_five)
?????

You can add unknown sequence together, provided their alphabets and characters are compatible, and get another memory saving UnknownSeq:

>>> unk_four = UnknownSeq(4)
>>> unk_four
UnknownSeq(4, alphabet = Alphabet(), character = '?')
>>> unk_four + unk_five
UnknownSeq(9, alphabet = Alphabet(), character = '?')

If the alphabet or characters don't match up, the addition gives an ordinary Seq object:

>>> unk_nnnn = UnknownSeq(4, character = "N")
>>> unk_nnnn
UnknownSeq(4, alphabet = Alphabet(), character = 'N')
>>> unk_nnnn + unk_four
Seq('NNNN????', Alphabet())

Combining with a real Seq gives a new Seq object:

>>> known_seq = Seq("ACGT")
>>> unk_four + known_seq
Seq('????ACGT', Alphabet())
>>> known_seq + unk_four
Seq('ACGT????', Alphabet())
Instance Methods [hide private]
 
__init__(self, length, alphabet=Alphabet(), character=None)
Create a new UnknownSeq object.
source code
 
__len__(self)
Returns the stated length of the unknown sequence.
source code
 
__str__(self)
Returns the unknown sequence as full string of the given length.
source code
 
__repr__(self)
Returns a (truncated) representation of the sequence for debugging.
source code
 
__add__(self, other)
Add another sequence or string to this sequence.
source code
 
__radd__(self, other)
Adding a sequence on the left.
source code
 
__getitem__(self, index)
Get a subsequence from the UnknownSeq object.
source code
 
count(self, sub, start=0, end=9223372036854775807)
Non-overlapping count method, like that of a python string.
source code
 
complement(self)
The complement of an unknown nucleotide equals itself.
source code
 
reverse_complement(self)
The reverse complement of an unknown nucleotide equals itself.
source code
 
transcribe(self)
Returns unknown RNA sequence from an unknown DNA sequence.
source code
 
back_transcribe(self)
Returns unknown DNA sequence from an unknown RNA sequence.
source code
 
upper(self)
Returns an upper case copy of the sequence.
source code
 
lower(self)
Returns a lower case copy of the sequence.
source code
 
translate(self, **kwargs)
Translate an unknown nucleotide sequence into an unknown protein.
source code
 
ungap(self, gap=None)
Return a copy of the sequence without the gap character(s).
source code

Inherited from Seq: __cmp__, __contains__, __hash__, endswith, find, lstrip, rfind, rsplit, rstrip, split, startswith, strip, tomutable, tostring

Inherited from Seq (private): _get_seq_str_and_check_alphabet

Inherited from object: __delattr__, __format__, __getattribute__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, length, alphabet=Alphabet(), character=None)
(Constructor)

source code 

Create a new UnknownSeq object.

If character is omitted, it is determined from the alphabet, "N" for nucleotides, "X" for proteins, and "?" otherwise.

Overrides: object.__init__

__len__(self)
(Length operator)

source code 

Returns the stated length of the unknown sequence.

Overrides: Seq.__len__

__str__(self)
(Informal representation operator)

source code 

Returns the unknown sequence as full string of the given length.

Overrides: object.__str__

__repr__(self)
(Representation operator)

source code 

Returns a (truncated) representation of the sequence for debugging.

Overrides: object.__repr__
(inherited documentation)

__add__(self, other)
(Addition operator)

source code 

Add another sequence or string to this sequence.

Adding two UnknownSeq objects returns another UnknownSeq object provided the character is the same and the alphabets are compatible.

>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(10, generic_protein) + UnknownSeq(5, generic_protein)
UnknownSeq(15, alphabet = ProteinAlphabet(), character = 'X')

If the characters differ, an UnknownSeq object cannot be used, so a Seq object is returned:

>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(10, generic_protein) + UnknownSeq(5, generic_protein,
...                                              character="x")
Seq('XXXXXXXXXXxxxxx', ProteinAlphabet())

If adding a string to an UnknownSeq, a new Seq is returned with the same alphabet:

>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(5, generic_protein) + "LV"
Seq('XXXXXLV', ProteinAlphabet())
Overrides: Seq.__add__

__radd__(self, other)
(Right-side addition operator)

source code 

Adding a sequence on the left.

If adding a string to a Seq, the alphabet is preserved:

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> "LV" + Seq("MELKI", generic_protein)
Seq('LVMELKI', ProteinAlphabet())

Adding two Seq (like) objects is handled via the __add__ method.

Overrides: Seq.__radd__
(inherited documentation)

__getitem__(self, index)
(Indexing operator)

source code 

Get a subsequence from the UnknownSeq object.

>>> unk = UnknownSeq(8, character="N")
>>> print(unk[:])
NNNNNNNN
>>> print(unk[5:3])
<BLANKLINE>
>>> print(unk[1:-1])
NNNNNN
>>> print(unk[1:-1:2])
NNN
Overrides: Seq.__getitem__

count(self, sub, start=0, end=9223372036854775807)

source code 

Non-overlapping count method, like that of a python string.

This behaves like the python string (and Seq object) method of the same name, which does a non-overlapping count!

Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.

Arguments:

  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end
>>> "NNNN".count("N")
4
>>> Seq("NNNN").count("N")
4
>>> UnknownSeq(4, character="N").count("N")
4
>>> UnknownSeq(4, character="N").count("A")
0
>>> UnknownSeq(4, character="N").count("AA")
0

HOWEVER, please note because that python strings and Seq objects (and MutableSeq objects) do a non-overlapping search, this may not give the answer you expect:

>>> UnknownSeq(4, character="N").count("NN")
2
>>> UnknownSeq(4, character="N").count("NNN")
1
Overrides: Seq.count

complement(self)

source code 

The complement of an unknown nucleotide equals itself.

>>> my_nuc = UnknownSeq(8)
>>> my_nuc
UnknownSeq(8, alphabet = Alphabet(), character = '?')
>>> print(my_nuc)
????????
>>> my_nuc.complement()
UnknownSeq(8, alphabet = Alphabet(), character = '?')
>>> print(my_nuc.complement())
????????
Overrides: Seq.complement

reverse_complement(self)

source code 

The reverse complement of an unknown nucleotide equals itself.

>>> my_nuc = UnknownSeq(10)
>>> my_nuc
UnknownSeq(10, alphabet = Alphabet(), character = '?')
>>> print(my_nuc)
??????????
>>> my_nuc.reverse_complement()
UnknownSeq(10, alphabet = Alphabet(), character = '?')
>>> print(my_nuc.reverse_complement())
??????????
Overrides: Seq.reverse_complement

transcribe(self)

source code 

Returns unknown RNA sequence from an unknown DNA sequence.

>>> my_dna = UnknownSeq(10, character="N")
>>> my_dna
UnknownSeq(10, alphabet = Alphabet(), character = 'N')
>>> print(my_dna)
NNNNNNNNNN
>>> my_rna = my_dna.transcribe()
>>> my_rna
UnknownSeq(10, alphabet = RNAAlphabet(), character = 'N')
>>> print(my_rna)
NNNNNNNNNN
Overrides: Seq.transcribe

back_transcribe(self)

source code 

Returns unknown DNA sequence from an unknown RNA sequence.

>>> my_rna = UnknownSeq(20, character="N")
>>> my_rna
UnknownSeq(20, alphabet = Alphabet(), character = 'N')
>>> print(my_rna)
NNNNNNNNNNNNNNNNNNNN
>>> my_dna = my_rna.back_transcribe()
>>> my_dna
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')
>>> print(my_dna)
NNNNNNNNNNNNNNNNNNNN
Overrides: Seq.back_transcribe

upper(self)

source code 

Returns an upper case copy of the sequence.

>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import UnknownSeq
>>> my_seq = UnknownSeq(20, generic_dna, character="n")
>>> my_seq
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'n')
>>> print(my_seq)
nnnnnnnnnnnnnnnnnnnn
>>> my_seq.upper()
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')
>>> print(my_seq.upper())
NNNNNNNNNNNNNNNNNNNN

This will adjust the alphabet if required. See also the lower method.

Overrides: Seq.upper

lower(self)

source code 

Returns a lower case copy of the sequence.

This will adjust the alphabet if required:

>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import UnknownSeq
>>> my_seq = UnknownSeq(20, IUPAC.extended_protein)
>>> my_seq
UnknownSeq(20, alphabet = ExtendedIUPACProtein(), character = 'X')
>>> print(my_seq)
XXXXXXXXXXXXXXXXXXXX
>>> my_seq.lower()
UnknownSeq(20, alphabet = ProteinAlphabet(), character = 'x')
>>> print(my_seq.lower())
xxxxxxxxxxxxxxxxxxxx

See also the upper method.

Overrides: Seq.lower

translate(self, **kwargs)

source code 

Translate an unknown nucleotide sequence into an unknown protein.

e.g.

>>> my_seq = UnknownSeq(9, character="N")
>>> print(my_seq)
NNNNNNNNN
>>> my_protein = my_seq.translate()
>>> my_protein
UnknownSeq(3, alphabet = ProteinAlphabet(), character = 'X')
>>> print(my_protein)
XXX

In comparison, using a normal Seq object:

>>> my_seq = Seq("NNNNNNNNN")
>>> print(my_seq)
NNNNNNNNN
>>> my_protein = my_seq.translate()
>>> my_protein
Seq('XXX', ExtendedIUPACProtein())
>>> print(my_protein)
XXX
Overrides: Seq.translate

ungap(self, gap=None)

source code 

Return a copy of the sequence without the gap character(s).

The gap character can be specified in two ways - either as an explicit argument, or via the sequence's alphabet. For example:

>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import Gapped, generic_dna
>>> my_dna = UnknownSeq(20, Gapped(generic_dna, "-"))
>>> my_dna
UnknownSeq(20, alphabet = Gapped(DNAAlphabet(), '-'), character = 'N')
>>> my_dna.ungap()
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')
>>> my_dna.ungap("-")
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')

If the UnknownSeq is using the gap character, then an empty Seq is returned:

>>> my_gap = UnknownSeq(20, Gapped(generic_dna, "-"), character="-")
>>> my_gap
UnknownSeq(20, alphabet = Gapped(DNAAlphabet(), '-'), character = '-')
>>> my_gap.ungap()
Seq('', DNAAlphabet())
>>> my_gap.ungap("-")
Seq('', DNAAlphabet())

Notice that the returned sequence's alphabet is adjusted to remove any explicit gap character declaration.

Overrides: Seq.ungap