[frames] | no frames]

# Class UnknownSeq

source code

```object --+
|
Seq --+
|
UnknownSeq
```

Read-only sequence object of known length but unknown contents.

If you have an unknown sequence, you can represent this with a normal Seq object, for example:

```>>> my_seq = Seq("N"*5)
>>> my_seq
Seq('NNNNN')
>>> len(my_seq)
5
>>> print(my_seq)
NNNNN```

However, this is rather wasteful of memory (especially for large sequences), which is where this class is most useful:

```>>> unk_five = UnknownSeq(5)
>>> unk_five
UnknownSeq(5, character='?')
>>> len(unk_five)
5
>>> print(unk_five)
?????```

You can add unknown sequence together, provided their alphabets and characters are compatible, and get another memory saving UnknownSeq:

```>>> unk_four = UnknownSeq(4)
>>> unk_four
UnknownSeq(4, character='?')
>>> unk_four + unk_five
UnknownSeq(9, character='?')```

If the alphabet or characters don't match up, the addition gives an ordinary Seq object:

```>>> unk_nnnn = UnknownSeq(4, character="N")
>>> unk_nnnn
UnknownSeq(4, character='N')
>>> unk_nnnn + unk_four
Seq('NNNN????')```

Combining with a real Seq gives a new Seq object:

```>>> known_seq = Seq("ACGT")
>>> unk_four + known_seq
Seq('????ACGT')
>>> known_seq + unk_four
Seq('ACGT????')```
 Instance Methods

 __init__(self, length, alphabet=Alphabet(), character=None) Create a new UnknownSeq object. source code

 __len__(self) Return the stated length of the unknown sequence. source code

 __str__(self) Return the unknown sequence as full string of the given length. source code

 __repr__(self) Return (truncated) representation of the sequence for debugging. source code

 __add__(self, other) Add another sequence or string to this sequence. source code

 __radd__(self, other) Add a sequence on the left. source code

 __mul__(self, other) Multiply UnknownSeq by integer. source code

 __rmul__(self, other) Multiply integer by UnknownSeq. source code

 __imul__(self, other) Multiply UnknownSeq in-place. source code

 __getitem__(self, index) Get a subsequence from the UnknownSeq object. source code

 count(self, sub, start=0, end=9223372036854775807) Return a non-overlapping count, like that of a python string. source code

 count_overlap(self, sub, start=0, end=9223372036854775807) Return an overlapping count. source code

 complement(self) Return the complement of an unknown nucleotide equals itself. source code

 reverse_complement(self) Return the reverse complement of an unknown sequence. source code

 transcribe(self) Return an unknown RNA sequence from an unknown DNA sequence. source code

 back_transcribe(self) Return an unknown DNA sequence from an unknown RNA sequence. source code

 upper(self) Return an upper case copy of the sequence. source code

 lower(self) Return a lower case copy of the sequence. source code

 translate(self, **kwargs) Translate an unknown nucleotide sequence into an unknown protein. source code

 ungap(self, gap=None) Return a copy of the sequence without the gap character(s). source code

Inherited from `Seq`: `__contains__`, `__eq__`, `__hash__`, `__le__`, `__lt__`, `__ne__`, `endswith`, `find`, `lstrip`, `rfind`, `rsplit`, `rstrip`, `split`, `startswith`, `strip`, `tomutable`, `tostring`

Inherited from `Seq` (private): `_get_seq_str_and_check_alphabet`

Inherited from `object`: `__delattr__`, `__format__`, `__getattribute__`, `__new__`, `__reduce__`, `__reduce_ex__`, `__setattr__`, `__sizeof__`, `__subclasshook__`

 Properties

Inherited from `object`: `__class__`

 Method Details

### __init__(self, length, alphabet=Alphabet(), character=None)(Constructor)

source code

Create a new UnknownSeq object.

If character is omitted, it is determined from the alphabet, "N" for nucleotides, "X" for proteins, and "?" otherwise.

Overrides: object.__init__

### __len__(self)(Length operator)

source code
Return the stated length of the unknown sequence.
Overrides: Seq.__len__

### __str__(self)(Informal representation operator)

source code
Return the unknown sequence as full string of the given length.
Overrides: object.__str__

### __repr__(self)(Representation operator)

source code
Return (truncated) representation of the sequence for debugging.
Overrides: object.__repr__

source code

Add another sequence or string to this sequence.

Adding two UnknownSeq objects returns another UnknownSeq object provided the character is the same and the alphabets are compatible.

```>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(10, generic_protein) + UnknownSeq(5, generic_protein)
UnknownSeq(15, alphabet=ProteinAlphabet(), character='X')```

If the characters differ, an UnknownSeq object cannot be used, so a Seq object is returned:

```>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(10, generic_protein) + UnknownSeq(5, generic_protein,
...                                              character="x")
Seq('XXXXXXXXXXxxxxx', ProteinAlphabet())```

If adding a string to an UnknownSeq, a new Seq is returned with the same alphabet:

```>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_protein
>>> UnknownSeq(5, generic_protein) + "LV"
Seq('XXXXXLV', ProteinAlphabet())```

source code
Add a sequence on the left.

### __mul__(self, other)

source code

Multiply UnknownSeq by integer.

```>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_dna
>>> UnknownSeq(3) * 2
UnknownSeq(6, character='?')
>>> UnknownSeq(3, generic_dna) * 2
UnknownSeq(6, alphabet=DNAAlphabet(), character='N')```
Overrides: Seq.__mul__

### __rmul__(self, other)

source code

Multiply integer by UnknownSeq.

```>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_dna
>>> 2 * UnknownSeq(3)
UnknownSeq(6, character='?')
>>> 2 * UnknownSeq(3, generic_dna)
UnknownSeq(6, alphabet=DNAAlphabet(), character='N')```
Overrides: Seq.__rmul__

### __imul__(self, other)

source code

Multiply UnknownSeq in-place.

Note although UnknownSeq is immutable, the in-place method is included to match the behaviour for regular Python strings.

```>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_dna
>>> seq = UnknownSeq(3, generic_dna)
>>> seq *= 2
>>> seq
UnknownSeq(6, alphabet=DNAAlphabet(), character='N')```
Overrides: Seq.__imul__

### __getitem__(self, index)(Indexing operator)

source code

Get a subsequence from the UnknownSeq object.

```>>> unk = UnknownSeq(8, character="N")
>>> print(unk[:])
NNNNNNNN
>>> print(unk[5:3])
<BLANKLINE>
>>> print(unk[1:-1])
NNNNNN
>>> print(unk[1:-1:2])
NNN```
Overrides: Seq.__getitem__

### count(self, sub, start=0, end=9223372036854775807)

source code

Return a non-overlapping count, like that of a python string.

This behaves like the python string (and Seq object) method of the same name, which does a non-overlapping count!

For an overlapping search use the newer count_overlap() method.

Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.

Arguments:
• sub - a string or another Seq object to look for
• start - optional integer, slice start
• end - optional integer, slice end
```>>> "NNNN".count("N")
4
>>> Seq("NNNN").count("N")
4
>>> UnknownSeq(4, character="N").count("N")
4
>>> UnknownSeq(4, character="N").count("A")
0
>>> UnknownSeq(4, character="N").count("AA")
0```

HOWEVER, please note because that python strings and Seq objects (and MutableSeq objects) do a non-overlapping search, this may not give the answer you expect:

```>>> UnknownSeq(4, character="N").count("NN")
2
>>> UnknownSeq(4, character="N").count("NNN")
1```
Overrides: Seq.count

### count_overlap(self, sub, start=0, end=9223372036854775807)

source code

Return an overlapping count.

For a non-overlapping search use the count() method.

Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.

Arguments:
• sub - a string or another Seq object to look for
• start - optional integer, slice start
• end - optional integer, slice end

e.g.

```>>> from Bio.Seq import UnknownSeq
>>> UnknownSeq(4, character="N").count_overlap("NN")
3
>>> UnknownSeq(4, character="N").count_overlap("NNN")
2```

Where substrings do not overlap, should behave the same as the count() method:

```>>> UnknownSeq(4, character="N").count_overlap("N")
4
>>> UnknownSeq(4, character="N").count_overlap("N") == UnknownSeq(4, character="N").count("N")
True
>>> UnknownSeq(4, character="N").count_overlap("A")
0
>>> UnknownSeq(4, character="N").count_overlap("A") == UnknownSeq(4, character="N").count("A")
True
>>> UnknownSeq(4, character="N").count_overlap("AA")
0
>>> UnknownSeq(4, character="N").count_overlap("AA") == UnknownSeq(4, character="N").count("AA")
True```
Overrides: Seq.count_overlap

### complement(self)

source code

Return the complement of an unknown nucleotide equals itself.

```>>> my_nuc = UnknownSeq(8)
>>> my_nuc
UnknownSeq(8, character='?')
>>> print(my_nuc)
????????
>>> my_nuc.complement()
UnknownSeq(8, character='?')
>>> print(my_nuc.complement())
????????```
Overrides: Seq.complement

### reverse_complement(self)

source code

Return the reverse complement of an unknown sequence.

The reverse complement of an unknown nucleotide equals itself:

```>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import generic_dna
>>> example = UnknownSeq(6, generic_dna)
>>> print(example)
NNNNNN
>>> print(example.reverse_complement())
NNNNNN```
Overrides: Seq.reverse_complement

### transcribe(self)

source code

Return an unknown RNA sequence from an unknown DNA sequence.

```>>> my_dna = UnknownSeq(10, character="N")
>>> my_dna
UnknownSeq(10, character='N')
>>> print(my_dna)
NNNNNNNNNN
>>> my_rna = my_dna.transcribe()
>>> my_rna
UnknownSeq(10, alphabet=RNAAlphabet(), character='N')
>>> print(my_rna)
NNNNNNNNNN```
Overrides: Seq.transcribe

### back_transcribe(self)

source code

Return an unknown DNA sequence from an unknown RNA sequence.

```>>> my_rna = UnknownSeq(20, character="N")
>>> my_rna
UnknownSeq(20, character='N')
>>> print(my_rna)
NNNNNNNNNNNNNNNNNNNN
>>> my_dna = my_rna.back_transcribe()
>>> my_dna
UnknownSeq(20, alphabet=DNAAlphabet(), character='N')
>>> print(my_dna)
NNNNNNNNNNNNNNNNNNNN```
Overrides: Seq.back_transcribe

### upper(self)

source code

Return an upper case copy of the sequence.

```>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import UnknownSeq
>>> my_seq = UnknownSeq(20, generic_dna, character="n")
>>> my_seq
UnknownSeq(20, alphabet=DNAAlphabet(), character='n')
>>> print(my_seq)
nnnnnnnnnnnnnnnnnnnn
>>> my_seq.upper()
UnknownSeq(20, alphabet=DNAAlphabet(), character='N')
>>> print(my_seq.upper())
NNNNNNNNNNNNNNNNNNNN```

This will adjust the alphabet if required. See also the lower method.

Overrides: Seq.upper

### lower(self)

source code

Return a lower case copy of the sequence.

This will adjust the alphabet if required:

```>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import UnknownSeq
>>> my_seq = UnknownSeq(20, IUPAC.extended_protein)
>>> my_seq
UnknownSeq(20, alphabet=ExtendedIUPACProtein(), character='X')
>>> print(my_seq)
XXXXXXXXXXXXXXXXXXXX
>>> my_seq.lower()
UnknownSeq(20, alphabet=ProteinAlphabet(), character='x')
>>> print(my_seq.lower())
xxxxxxxxxxxxxxxxxxxx```

Overrides: Seq.lower

### translate(self, **kwargs)

source code

Translate an unknown nucleotide sequence into an unknown protein.

e.g.

```>>> my_seq = UnknownSeq(9, character="N")
>>> print(my_seq)
NNNNNNNNN
>>> my_protein = my_seq.translate()
>>> my_protein
UnknownSeq(3, alphabet=ProteinAlphabet(), character='X')
>>> print(my_protein)
XXX```

In comparison, using a normal Seq object:

```>>> my_seq = Seq("NNNNNNNNN")
>>> print(my_seq)
NNNNNNNNN
>>> my_protein = my_seq.translate()
>>> my_protein
Seq('XXX', ExtendedIUPACProtein())
>>> print(my_protein)
XXX```
Overrides: Seq.translate

### ungap(self, gap=None)

source code

Return a copy of the sequence without the gap character(s).

The gap character can be specified in two ways - either as an explicit argument, or via the sequence's alphabet. For example:

```>>> from Bio.Seq import UnknownSeq
>>> from Bio.Alphabet import Gapped, generic_dna
>>> my_dna = UnknownSeq(20, Gapped(generic_dna, "-"))
>>> my_dna
UnknownSeq(20, alphabet=Gapped(DNAAlphabet(), '-'), character='N')
>>> my_dna.ungap()
UnknownSeq(20, alphabet=DNAAlphabet(), character='N')
>>> my_dna.ungap("-")
UnknownSeq(20, alphabet=DNAAlphabet(), character='N')```

If the UnknownSeq is using the gap character, then an empty Seq is returned:

```>>> my_gap = UnknownSeq(20, Gapped(generic_dna, "-"), character="-")
>>> my_gap
UnknownSeq(20, alphabet=Gapped(DNAAlphabet(), '-'), character='-')
>>> my_gap.ungap()
Seq('', DNAAlphabet())
>>> my_gap.ungap("-")
Seq('', DNAAlphabet())```

Notice that the returned sequence's alphabet is adjusted to remove any explicit gap character declaration.

Overrides: Seq.ungap

 Generated by Epydoc 3.0.1 on Fri Jun 22 16:36:04 2018 http://epydoc.sourceforge.net