Package Bio :: Module Seq :: Class UnknownSeq
[hide private]
[frames] | no frames]

Class UnknownSeq

source code

object --+    
         |    
       Seq --+
             |
            UnknownSeq

Read-only sequence object of known length but unknown contents.

If you have an unknown sequence, you can represent this with a normal Seq object, for example:

>>> my_seq = Seq("N"*5)
>>> my_seq
Seq('NNNNN', Alphabet())
>>> len(my_seq)
5
>>> print(my_seq)
NNNNN

However, this is rather wasteful of memory (especially for large sequences), which is where this class is most useful:

>>> unk_five = UnknownSeq(5)
>>> unk_five
UnknownSeq(5, alphabet = Alphabet(), character = '?')
>>> len(unk_five)
5
>>> print(unk_five)
?????

You can add unknown sequence together, provided their alphabets and characters are compatible, and get another memory saving UnknownSeq:

>>> unk_four = UnknownSeq(4)
>>> unk_four
UnknownSeq(4, alphabet = Alphabet(), character = '?')
>>> unk_four + unk_five
UnknownSeq(9, alphabet = Alphabet(), character = '?')

If the alphabet or characters don't match up, the addition gives an ordinary Seq object:

>>> unk_nnnn = UnknownSeq(4, character = "N")
>>> unk_nnnn
UnknownSeq(4, alphabet = Alphabet(), character = 'N')
>>> unk_nnnn + unk_four
Seq('NNNN????', Alphabet())

Combining with a real Seq gives a new Seq object:

>>> known_seq = Seq("ACGT")
>>> unk_four + known_seq
Seq('????ACGT', Alphabet())
>>> known_seq + unk_four
Seq('ACGT????', Alphabet())
Instance Methods [hide private]
 
__init__(self, length, alphabet=Alphabet(), character=None)
Create a new UnknownSeq object.
source code
 
__len__(self)
Return the stated length of the unknown sequence.
source code
 
__str__(self)
Return the unknown sequence as full string of the given length.
source code
 
__repr__(self)
Return (truncated) representation of the sequence for debugging.
source code
 
__add__(self, other)
Add another sequence or string to this sequence.
source code
 
__radd__(self, other)
Add a sequence on the left.
source code
 
__getitem__(self, index)
Get a subsequence from the UnknownSeq object.
source code
 
count(self, sub, start=0, end=9223372036854775807)
Return a non-overlapping count, like that of a python string.
source code
 
count_overlap(self, sub, start=0, end=9223372036854775807)
Return an overlapping count.
source code
 
complement(self)
Return the complement of an unknown nucleotide equals itself.
source code
 
reverse_complement(self)
Return the reverse complement of an unknown sequence.
source code
 
transcribe(self)
Return an unknown RNA sequence from an unknown DNA sequence.
source code
 
back_transcribe(self)
Return an unknown DNA sequence from an unknown RNA sequence.
source code
 
upper(self)
Return an upper case copy of the sequence.
source code
 
lower(self)
Return a lower case copy of the sequence.
source code
 
translate(self, **kwargs)
Translate an unknown nucleotide sequence into an unknown protein.
source code
 
ungap(self, gap=None)
Return a copy of the sequence without the gap character(s).
source code

Inherited from Seq: __contains__, __eq__, __hash__, __le__, __lt__, __ne__, endswith, find, lstrip, rfind, rsplit, rstrip, split, startswith, strip, tomutable, tostring

Inherited from Seq (private): _get_seq_str_and_check_alphabet

Inherited from object: __delattr__, __format__, __getattribute__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, length, alphabet=Alphabet(), character=None)
(Constructor)

source code 

Create a new UnknownSeq object.

If character is omitted, it is determined from the alphabet, "N" for nucleotides, "X" for proteins, and "?" otherwise.

Overrides: object.__init__

__len__(self)
(Length operator)

source code 
Return the stated length of the unknown sequence.
Overrides: Seq.__len__

__str__(self)
(Informal representation operator)

source code 
Return the unknown sequence as full string of the given length.
Overrides: object.__str__

__repr__(self)
(Representation operator)

source code 
Return (truncated) representation of the sequence for debugging.
Overrides: object.__repr__

__add__(self, other)
(Addition operator)

source code 

Add another sequence or string to this sequence.

Adding two UnknownSeq objects returns another UnknownSeq object provided the character is the same and the alphabets are compatible.

>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(10, generic_protein) + UnknownSeq(5, generic_protein)
UnknownSeq(15, alphabet = ProteinAlphabet(), character = 'X')

If the characters differ, an UnknownSeq object cannot be used, so a Seq object is returned:

>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(10, generic_protein) + UnknownSeq(5, generic_protein,
...                                              character="x")
Seq('XXXXXXXXXXxxxxx', ProteinAlphabet())

If adding a string to an UnknownSeq, a new Seq is returned with the same alphabet:

>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(5, generic_protein) + "LV"
Seq('XXXXXLV', ProteinAlphabet())
Overrides: Seq.__add__

__radd__(self, other)
(Right-side addition operator)

source code 
Add a sequence on the left.
Overrides: Seq.__radd__

__getitem__(self, index)
(Indexing operator)

source code 

Get a subsequence from the UnknownSeq object.

>>> unk = UnknownSeq(8, character="N")
>>> print(unk[:])
NNNNNNNN
>>> print(unk[5:3])
<BLANKLINE>
>>> print(unk[1:-1])
NNNNNN
>>> print(unk[1:-1:2])
NNN
Overrides: Seq.__getitem__

count(self, sub, start=0, end=9223372036854775807)

source code 

Return a non-overlapping count, like that of a python string.

This behaves like the python string (and Seq object) method of the same name, which does a non-overlapping count!

For an overlapping search use the newer count_overlap() method.

Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end
>>> "NNNN".count("N")
4
>>> Seq("NNNN").count("N")
4
>>> UnknownSeq(4, character="N").count("N")
4
>>> UnknownSeq(4, character="N").count("A")
0
>>> UnknownSeq(4, character="N").count("AA")
0

HOWEVER, please note because that python strings and Seq objects (and MutableSeq objects) do a non-overlapping search, this may not give the answer you expect:

>>> UnknownSeq(4, character="N").count("NN")
2
>>> UnknownSeq(4, character="N").count("NNN")
1
Overrides: Seq.count

count_overlap(self, sub, start=0, end=9223372036854775807)

source code 

Return an overlapping count.

For a non-overlapping search use the count() method.

Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.

Arguments:
  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end

e.g.

>>> from Bio.Seq import UnknownSeq
>>> UnknownSeq(4, character="N").count_overlap("NN")
3
>>> UnknownSeq(4, character="N").count_overlap("NNN")
2

Where substrings do not overlap, should behave the same as the count() method:

>>> UnknownSeq(4, character="N").count_overlap("N")
4
>>> UnknownSeq(4, character="N").count_overlap("N") == UnknownSeq(4, character="N").count("N")
True
>>> UnknownSeq(4, character="N").count_overlap("A")
0
>>> UnknownSeq(4, character="N").count_overlap("A") == UnknownSeq(4, character="N").count("A")
True
>>> UnknownSeq(4, character="N").count_overlap("AA")
0
>>> UnknownSeq(4, character="N").count_overlap("AA") == UnknownSeq(4, character="N").count("AA")
True
Overrides: Seq.count_overlap

complement(self)

source code 

Return the complement of an unknown nucleotide equals itself.

>>> my_nuc = UnknownSeq(8)
>>> my_nuc
UnknownSeq(8, alphabet = Alphabet(), character = '?')
>>> print(my_nuc)
????????
>>> my_nuc.complement()
UnknownSeq(8, alphabet = Alphabet(), character = '?')
>>> print(my_nuc.complement())
????????
Overrides: Seq.complement

reverse_complement(self)

source code 

Return the reverse complement of an unknown sequence.

The reverse complement of an unknown nucleotide equals itself:

>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_dna
>>> example = UnknownSeq(6, generic_dna)
>>> print(example)
NNNNNN
>>> print(example.reverse_complement())
NNNNNN
Overrides: Seq.reverse_complement

transcribe(self)

source code 

Return an unknown RNA sequence from an unknown DNA sequence.

>>> my_dna = UnknownSeq(10, character="N")
>>> my_dna
UnknownSeq(10, alphabet = Alphabet(), character = 'N')
>>> print(my_dna)
NNNNNNNNNN
>>> my_rna = my_dna.transcribe()
>>> my_rna
UnknownSeq(10, alphabet = RNAAlphabet(), character = 'N')
>>> print(my_rna)
NNNNNNNNNN
Overrides: Seq.transcribe

back_transcribe(self)

source code 

Return an unknown DNA sequence from an unknown RNA sequence.

>>> my_rna = UnknownSeq(20, character="N")
>>> my_rna
UnknownSeq(20, alphabet = Alphabet(), character = 'N')
>>> print(my_rna)
NNNNNNNNNNNNNNNNNNNN
>>> my_dna = my_rna.back_transcribe()
>>> my_dna
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')
>>> print(my_dna)
NNNNNNNNNNNNNNNNNNNN
Overrides: Seq.back_transcribe

upper(self)

source code 

Return an upper case copy of the sequence.

>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import UnknownSeq
>>> my_seq = UnknownSeq(20, generic_dna, character="n")
>>> my_seq
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'n')
>>> print(my_seq)
nnnnnnnnnnnnnnnnnnnn
>>> my_seq.upper()
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')
>>> print(my_seq.upper())
NNNNNNNNNNNNNNNNNNNN

This will adjust the alphabet if required. See also the lower method.

Overrides: Seq.upper

lower(self)

source code 

Return a lower case copy of the sequence.

This will adjust the alphabet if required:

>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import UnknownSeq
>>> my_seq = UnknownSeq(20, IUPAC.extended_protein)
>>> my_seq
UnknownSeq(20, alphabet = ExtendedIUPACProtein(), character = 'X')
>>> print(my_seq)
XXXXXXXXXXXXXXXXXXXX
>>> my_seq.lower()
UnknownSeq(20, alphabet = ProteinAlphabet(), character = 'x')
>>> print(my_seq.lower())
xxxxxxxxxxxxxxxxxxxx

See also the upper method.

Overrides: Seq.lower

translate(self, **kwargs)

source code 

Translate an unknown nucleotide sequence into an unknown protein.

e.g.

>>> my_seq = UnknownSeq(9, character="N")
>>> print(my_seq)
NNNNNNNNN
>>> my_protein = my_seq.translate()
>>> my_protein
UnknownSeq(3, alphabet = ProteinAlphabet(), character = 'X')
>>> print(my_protein)
XXX

In comparison, using a normal Seq object:

>>> my_seq = Seq("NNNNNNNNN")
>>> print(my_seq)
NNNNNNNNN
>>> my_protein = my_seq.translate()
>>> my_protein
Seq('XXX', ExtendedIUPACProtein())
>>> print(my_protein)
XXX
Overrides: Seq.translate

ungap(self, gap=None)

source code 

Return a copy of the sequence without the gap character(s).

The gap character can be specified in two ways - either as an explicit argument, or via the sequence's alphabet. For example:

>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import Gapped, generic_dna
>>> my_dna = UnknownSeq(20, Gapped(generic_dna, "-"))
>>> my_dna
UnknownSeq(20, alphabet = Gapped(DNAAlphabet(), '-'), character = 'N')
>>> my_dna.ungap()
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')
>>> my_dna.ungap("-")
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')

If the UnknownSeq is using the gap character, then an empty Seq is returned:

>>> my_gap = UnknownSeq(20, Gapped(generic_dna, "-"), character="-")
>>> my_gap
UnknownSeq(20, alphabet = Gapped(DNAAlphabet(), '-'), character = '-')
>>> my_gap.ungap()
Seq('', DNAAlphabet())
>>> my_gap.ungap("-")
Seq('', DNAAlphabet())

Notice that the returned sequence's alphabet is adjusted to remove any explicit gap character declaration.

Overrides: Seq.ungap