Package Bio :: Module pairwise2
[hide private]
[frames] | no frames]

Module pairwise2

source code

This package implements pairwise sequence alignment using a dynamic
programming algorithm.

This provides functions to get global and local alignments between two
sequences.  A global alignment finds the best concordance between all
characters in two sequences.  A local alignment finds just the
subsequences that align the best.

When doing alignments, you can specify the match score and gap
penalties.  The match score indicates the compatibility between an
alignment of two characters in the sequences.  Highly compatible
characters should be given positive scores, and incompatible ones
should be given negative scores or 0.  The gap penalties should be
negative.

The names of the alignment functions in this module follow the
convention
<alignment type>XX
where <alignment type> is either "global" or "local" and XX is a 2
character code indicating the parameters it takes.  The first
character indicates the parameters for matches (and mismatches), and
the second indicates the parameters for gap penalties.

The match parameters are
CODE  DESCRIPTION
x     No parameters.  Identical characters have score of 1, otherwise 0.
m     A match score is the score of identical chars, otherwise mismatch score.
d     A dictionary returns the score of any pair of characters.
c     A callback function returns scores.

The gap penalty parameters are
CODE  DESCRIPTION
x     No gap penalties.
s     Same open and extend gap penalties for both sequences.
d     The sequences have different open and extend gap penalties.
c     A callback function returns the gap penalties.

All the different alignment functions are contained in an object
"align".  For example:

    >>> from Bio import pairwise2
    >>> alignments = pairwise2.align.globalxx("ACCGT", "ACG")

will return a list of the alignments between the two strings.  The
parameters of the alignment function depends on the function called.
Some examples:

    # Find the best global alignment between the two sequences.
    # Identical characters are given 1 point.  No points are deducted
    # for mismatches or gaps.
    >>> from Bio.pairwise2 import format_alignment
    >>> for a in pairwise2.align.globalxx("ACCGT", "ACG"):
    ...     print(format_alignment(*a))
    ACCGT
    |||||
    AC-G-
      Score=3
    <BLANKLINE>
    ACCGT
    |||||
    A-CG-
      Score=3
    <BLANKLINE>

    # Same thing as before, but with a local alignment.
    >>> for a in pairwise2.align.localxx("ACCGT", "ACG"):
    ...     print(format_alignment(*a))
    ACCGT
    ||||
    AC-G-
      Score=3
    <BLANKLINE>
    ACCGT
    ||||
    A-CG-
      Score=3
    <BLANKLINE>

    # Do a global alignment.  Identical characters are given 2 points,
    # 1 point is deducted for each non-identical character.
    >>> for a in pairwise2.align.globalmx("ACCGT", "ACG", 2, -1):
    ...     print(format_alignment(*a))
    ACCGT
    |||||
    AC-G-
      Score=6
    <BLANKLINE>
    ACCGT
    |||||
    A-CG-
      Score=6
    <BLANKLINE>

    # Same as above, except now 0.5 points are deducted when opening a
    # gap, and 0.1 points are deducted when extending it.
    >>> for a in pairwise2.align.globalms("ACCGT", "ACG", 2, -1, -.5, -.1):
    ...     print(format_alignment(*a))
    ACCGT
    |||||
    AC-G-
      Score=5
    <BLANKLINE>
    ACCGT
    |||||
    A-CG-
      Score=5
    <BLANKLINE>

The alignment function can also use known matrices already included in
Biopython ( Bio.SubsMat -> MatrixInfo ).

    >>> from Bio.SubsMat import MatrixInfo as matlist
    >>> matrix = matlist.blosum62
    >>> for a in pairwise2.align.globaldx("KEVLA", "EVL", matrix):
    ...     print(format_alignment(*a))
    KEVLA
    |||||
    -EVL-
      Score=13
    <BLANKLINE>

To see a description of the parameters for a function, please look at
the docstring for the function via the help function, e.g.
type help(pairwise2.align.localds) at the Python prompt.

Classes [hide private]
  identity_match
identity_match([match][, mismatch]) -> match_fn
  dictionary_match
dictionary_match(score_dict[, symmetric]) -> match_fn
  affine_penalty
affine_penalty(open, extend[, penalize_extend_when_opening]) -> gap_fn
Functions [hide private]
 
_align(sequenceA, sequenceB, match_fn, gap_A_fn, gap_B_fn, penalize_extend_when_opening, penalize_end_gaps, align_globally, gap_char, force_generic, score_only, one_alignment_only) source code
 
_make_score_matrix_generic(sequenceA, sequenceB, match_fn, gap_A_fn, gap_B_fn, penalize_extend_when_opening, penalize_end_gaps, align_globally, score_only) source code
 
_recover_alignments(sequenceA, sequenceB, starts, score_matrix, trace_matrix, align_globally, gap_char, one_alignment_only) source code
 
_find_start(score_matrix, sequenceA, sequenceB, gap_A_fn, gap_B_fn, penalize_end_gaps, align_globally) source code
 
_find_global_start(sequenceA, sequenceB, score_matrix, gap_A_fn, gap_B_fn, penalize_end_gaps) source code
 
_find_local_start(score_matrix) source code
 
_clean_alignments(alignments) source code
 
_pad_until_equal(s1, s2, char) source code
 
_lpad_until_equal(s1, s2, char) source code
 
_pad(s, char, n) source code
 
_lpad(s, char, n) source code
 
calc_affine_penalty(length, open, extend, penalize_extend_when_opening) source code
 
print_matrix(matrix)
Print out a matrix.
source code
string

format_alignment(align1, align2, score, begin, end)
Format the alignment prettily into a string.
source code
 
rint(x, precision=1000) source code
 
_make_score_matrix_fast(sequenceA, sequenceB, match_fn, open_A, extend_A, open_B, extend_B, penalize_extend_when_opening, penalize_end_gaps, align_globally, score_only) source code
 
_test()
Run the module's doctests (PRIVATE).
source code
Variables [hide private]
  MAX_ALIGNMENTS = 1000
  align = align()
  _PRECISION = 1000
  __package__ = 'Bio'
Function Details [hide private]

print_matrix(matrix)

source code 
Print out a matrix.  For debugging purposes.