Package Bio :: Package CodonAlign
[hide private]
[frames] | no frames]

Package CodonAlign

source code

Code for dealing with Codon Alignment.

Submodules [hide private]

Functions [hide private]
 
build(pro_align, nucl_seqs, corr_dict=None, gap_char='-', unknown='X', codon_table=default_codon_table, alphabet=None, complete_protein=False, anchor_len=10, max_score=10)
Build a codon alignment from a protein alignment and corresponding nucleotide sequences
source code
 
_codons2re(codons)
Generate regular expression based on a given list of codons
source code
 
_get_aa_regex(codon_table, stop='*', unknown='X')
Set up the regular expression of a given CodonTable for futher use.
source code
 
_check_corr(pro, nucl, gap_char='-', codon_table=default_codon_table, complete_protein=False, anchor_len=10)
check if a give protein SeqRecord can be translated by another nucleotide SeqRecord.
source code
 
_get_shift_anchor_re(sh_anc, sh_nuc, shift_val, aa2re, anchor_len, shift_id_pos)
This function tries all the best to come up with an re that matches a potentially shifted anchor.
source code
 
_merge_aa2re(aa1, aa2, shift_val, aa2re, reid)
Function to merge two amino acids based on detected frame shift value.
source code
 
_get_codon_rec(pro, nucl, span_mode, alphabet, gap_char='-', codon_table=default_codon_table, complete_protein=False, max_score=10)
Generate codon alignment based on regular re match (PRIVATE)
source code
 
_align_shift_recs(recs)
This function is useful to build alignment according to the frameshift detected by _check_corr.
source code
Variables [hide private]
  __package__ = 'Bio.CodonAlign'
  __warningregistry__ = {('Bio.CodonAlign is an experimental mod...
Function Details [hide private]

build(pro_align, nucl_seqs, corr_dict=None, gap_char='-', unknown='X', codon_table=default_codon_table, alphabet=None, complete_protein=False, anchor_len=10, max_score=10)

source code 

Build a codon alignment from a protein alignment and corresponding nucleotide sequences

Arguments:

  • pro_align - a protein MultipleSeqAlignment object
  • nucl_align - an object returned by SeqIO.parse or SeqIO.index or a colloction of SeqRecord.
  • alphabet - alphabet for the returned codon alignment
  • corr_dict - a dict that maps protein id to nucleotide id
  • complete_protein - whether the sequence begins with a start codon
  • frameshift - whether to appply frameshift detection

Return a CodonAlignment object

>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> from Bio.Align import MultipleSeqAlignment
>>> seq1 = SeqRecord(Seq('TCAGGGACTGCGAGAACCAAGCTACTGCTGCTGCTGGCTGCGCTCTGCGCCGCAGGTGGGGCGCTGGAG',
...     alphabet=IUPAC.IUPACUnambiguousDNA()), id='pro1')
>>> seq2 = SeqRecord(Seq('TCAGGGACTTCGAGAACCAAGCGCTCCTGCTGCTGGCTGCGCTCGGCGCCGCAGGTGGAGCACTGGAG',
...     alphabet=IUPAC.IUPACUnambiguousDNA()), id='pro2')
>>> pro1 = SeqRecord(Seq('SGTARTKLLLLLAALCAAGGALE', alphabet=IUPAC.protein),id='pro1')
>>> pro2 = SeqRecord(Seq('SGTSRTKRLLLLAALGAAGGALE', alphabet=IUPAC.protein),id='pro2')
>>> aln = MultipleSeqAlignment([pro1, pro2])
>>> codon_aln = build(aln, [seq1, seq2])
>>> print(codon_aln)
CodonAlphabet(Standard) CodonAlignment with 2 rows and 69 columns (23 codons)
TCAGGGACTGCGAGAACCAAGCTACTGCTGCTGCTGGCTGCGCTCTGCGCCGCAGGT...GAG pro1
TCAGGGACTTCGAGAACCAAGCG-CTCCTGCTGCTGGCTGCGCTCGGCGCCGCAGGT...GAG pro2

_get_aa_regex(codon_table, stop='*', unknown='X')

source code 

Set up the regular expression of a given CodonTable for futher use.

>>> from Bio.Data.CodonTable import generic_by_id
>>> p = generic_by_id[1]
>>> t = _get_aa_regex(p)
>>> print(t['A'][0])
G
>>> print(t['A'][1])
C
>>> print(sorted(list(t['A'][2:])))
['A', 'C', 'G', 'T', 'U', '[', ']']
>>> print(sorted(list(t['L'][:5])))
['C', 'T', 'U', '[', ']']
>>> print(sorted(list(t['L'][5:9])))
['T', 'U', '[', ']']
>>> print(sorted(list(t['L'][9:])))
['A', 'C', 'G', 'T', 'U', '[', ']']

_get_shift_anchor_re(sh_anc, sh_nuc, shift_val, aa2re, anchor_len, shift_id_pos)

source code 

This function tries all the best to come up with an re that matches a potentially shifted anchor.

Arguments:

  • sh_anc - shifted anchor sequence
  • sh_nuc - potentially corresponding nucleotide sequence of sh_anc
  • shift_val - 1 or 2 indicates forward frame shift, whereas 3*anchor_len-1 or 3*anchor_len-2 indicates backward shift
  • aa2re - aa to codon re dict
  • anchor_len - length of the anchor
  • shift_id_pos - specify current shift name we are at

_get_codon_rec(pro, nucl, span_mode, alphabet, gap_char='-', codon_table=default_codon_table, complete_protein=False, max_score=10)

source code 

Generate codon alignment based on regular re match (PRIVATE)

span_mode is a tuple returned by _check_corr. The first element is the span of a re search, and the second element is the mode for the match.

mode

  • 0: direct match
  • 1: mismatch (no indels)
  • 2: frameshift

_align_shift_recs(recs)

source code 
This function is useful to build alignment according to the
frameshift detected by _check_corr.

Argument:
- recs     - a list of SeqRecords containing a CodonSeq dictated
             by a rf_table (with frameshift in some of them).


Variables Details [hide private]

__warningregistry__

Value:
{('Bio.CodonAlign is an experimental module which may undergo signific\
ant changes prior to its future official release.',
  <class 'Bio.BiopythonExperimentalWarning'>,
  17): True}