Bio.Align.sam module

Bio.Align support for the “sam” pairwise alignment format.

The Sequence Alignment/Map (SAM) format, created by Heng Li and Richard Durbin at the Wellcome Trust Sanger Institute, stores a series of alignments to the genome in a single file. Typically they are used for next-generation sequencing data. SAM files store the alignment positions for mapped sequences, and may also store the aligned sequences and other information associated with the sequence.

See http://www.htslib.org/ for more information.

You are expected to use this module via the Bio.Align functions.

Coordinates in the SAM format are defined in terms of one-based start positions; the parser converts these to zero-based coordinates to be consistent with Python and other alignment formats.

class Bio.Align.sam.AlignmentWriter(target, md=False)

Bases: Bio.Align.interfaces.AlignmentWriter

Alignment file writer for the Sequence Alignment/Map (SAM) file format.

fmt = 'SAM'
__init__(target, md=False)

Create an AlignmentWriter object.

Arguments:
  • md - If True, calculate the MD tag from the alignment and include it

    in the output. If False (default), do not include the MD tag in the output.

write_header(alignments)

Write the SAM header.

format_alignment(alignment, md=None)

Return a string with a single alignment formatted as one SAM line.

__abstractmethods__ = frozenset({})
class Bio.Align.sam.AlignmentIterator(source)

Bases: Bio.Align.interfaces.AlignmentIterator

Alignment iterator for Sequence Alignment/Map (SAM) files.

Each line in the file contains one genomic alignment, which are loaded and returned incrementally. The following columns are stored as attributes of the alignment:

  • flag: The FLAG combination of bitwise flags;

  • mapq: Mapping Quality (only stored if available)

  • rnext: Reference sequence name of the primary alignment of the next read

    in the alignment (only stored if available)

  • pnext: Zero-based position of the primary alignment of the next read in

    the template (only stored if available)

  • tlen: signed observed template length (only stored if available)

Other information associated with the alignment by its tags are stored in the annotations attribute of each alignment.

Any hard clipping (clipped sequences not present in the query sequence) are stored as ‘hard_clip_left’ and ‘hard_clip_right’ in the annotations dictionary attribute of the query sequence record.

The sequence quality, if available, is stored as ‘phred_quality’ in the letter_annotations dictionary attribute of the query sequence record.

fmt = 'SAM'
__abstractmethods__ = frozenset({})