In Biopython, sequences are usually held as ` Seq` objects, which add various biological methods on top of string like behaviour.
This page describes the Biopython Seq object, defined in the Bio.Seq
module (together with related objects like the MutableSeq, plus some
general purpose sequence functions). In addition to this wiki page,
there is a whole chapter in the
Tutorial
(PDF) on the
Seq object - plus its API
documentation
(which you can read online, or from within Python with the help
command).
If you need to store additional information like a sequence identifier
or name, or even more details like a description or annotation, then we
use a SeqRecord object instead. These are the
sequence records used by the SeqIO module for
reading and writing sequence files.
The Seq object essentially combines a Python string with biological
methods. For example:
>>> from Bio.Seq import Seq
>>> my_seq = Seq("AGTACACTGGT")
>>> my_seq
Seq('AGTACACTGGT')
Biopython doesn’t know if this is a nucleotide sequence or a protein rich in alanines, glycines, cysteines and threonines. If you know, keep this mind when you call methods like (reverse)complement - see below.
The Seq object has a number of methods which act just like those of a
Python string, for example the find method:
>>> from Bio.Seq import Seq
>>> my_dna = Seq("AGTACACTGGT")
>>> my_dna
Seq('AGTACACTGGT')
>>> my_dna.find("ACT")
5
>>> my_dna.find("TAG")
-1
There is a count method too:
>>> my_dna.count("A")
3
>>> my_dna.count("ACT")
1
However, watch out because just like the Python string’s count, this is a non-overlapping count!
>>> "AAAA".count("AA")
2
>>> Seq("AAAA").count("AA")
2
In some biological situations, you might prefer an overlapping count
using count_overlap which would give three for this example.
If you have a nucleotide sequence you may want to do things like take the reverse complement, or do a translation. Note some of these methods described here are only available in Biopython 1.49 onwards.
These are very simple - the methods return a new Seq object with the
appropriate sequence:
>>> from Bio.Seq import Seq
>>> my_dna = Seq("AGTACACTGGT")
>>> my_dna
Seq('AGTACACTGGT')
>>> my_dna.complement()
Seq('TCATGTGACCA')
>>> my_dna.reverse_complement()
Seq('ACCAGTGTACT')
If you have a DNA sequence, you may want to turn it into RNA. In bioinformatics we normally assume the DNA is the coding strand (not the template strand) so this is a simple matter of replacing all the thymines with uracil:
>>> my_dna
Seq('AGTACACTGGT')
>>> my_dna.transcribe()
Seq('AGUACACUGGU')
Naturally, given some RNA, you might want the associated DNA - and again Biopython does a simple U/T substitution:
>>> my_rna = my_dna.transcribe()
>>> my_rna
Seq('AGUACACUGGU')
>>> my_rna.back_transcribe()
Seq('AGTACACTGGT')
If you actually do want the template strand, you’d have to do a reverse complement on top:
>>> my_rna
Seq('AGUACACUGGU')
>>> my_rna.back_transcribe().reverse_complement()
Seq('ACCAGTGTACT')
The chapter in the Tutorial (PDF) goes into more detail on this strand issue.
You can translate RNA:
>>> from Bio.Seq import Seq
>>> messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG")
>>> messenger_rna.translate()
Seq('MAIVMGR*KGAR*')
Or DNA - which is assumed to be the coding strand:
>>> from Bio.Seq import Seq
>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> coding_dna.translate()
Seq('MAIVMGR*KGAR*')
In either case there are several useful options - by default as you will notice the in example above translation continues through any stop codons, but this is optional:
>>> coding_dna.translate(to_stop=True)
Seq('MAIVMGR')
Then there is the translation table, for which you can give an NCBI genetic code number or name:
>>> coding_dna.translate(table=2)
Seq('MAIVMGRWKGAR*')
>>> coding_dna.translate(table="Vertebrate Mitochondrial")
Seq('MAIVMGRWKGAR*')
You can of course combine these options:
>>> coding_dna.translate(table=2, to_stop=True)
Seq('MAIVMGRWKGAR')
Consult the tutorial for more examples and arguments (e.g. specifying a different symbol for a stop codon), or see the built in help:
>>> help(coding_dna.translate)
...