Bio.AlignIO.MafIO module

Bio.AlignIO support for the “maf” multiple alignment format.

The Multiple Alignment Format, described by UCSC, stores a series of multiple alignments in a single file. It is suitable for whole-genome to whole-genome alignments, metadata such as source chromosome, start position, size, and strand can be stored.

See http://genome.ucsc.edu/FAQ/FAQformat.html#format5

You are expected to use this module via the Bio.AlignIO functions(or the Bio.SeqIO functions if you want to work directly with the gapped sequences).

Coordinates in the MAF format are defined in terms of zero-based start positions (like Python) and aligning region sizes.

A minimal aligned region of length one and starting at first position in the source sequence would have start == 0 and size == 1.

As we can see on this example, start + size will give one more than the zero-based end position. We can therefore manipulate start and start + size as python list slice boundaries.

For an inclusive end coordinate, we need to use end = start + size - 1. A 1-column wide alignment would have start == end.

class Bio.AlignIO.MafIO.MafWriter(handle)

Bases: SequentialAlignmentWriter

Accepts a MultipleSeqAlignment object, writes a MAF file.

write_header()

Write the MAF header.

write_alignment(alignment)

Write a complete alignment to a MAF block.

Writes every SeqRecord in a MultipleSeqAlignment object to its own MAF block (beginning with an ‘a’ line, containing ‘s’ lines).

Bio.AlignIO.MafIO.MafIterator(handle, seq_count=None)

Iterate over a MAF file handle as MultipleSeqAlignment objects.

Iterates over lines in a MAF file-like object (handle), yielding MultipleSeqAlignment objects. SeqRecord IDs generally correspond to species names.

class Bio.AlignIO.MafIO.MafIndex(sqlite_file, maf_file, target_seqname)

Bases: object

Index for a MAF file.

The index is a sqlite3 database that is built upon creation of the object if necessary, and queried when methods search or get_spliced are used.

__init__(sqlite_file, maf_file, target_seqname)

Indexes or loads the index of a MAF file.

close()

Close the file handle being used to read the data.

Once called, further use of the index won’t work. The sole purpose of this method is to allow explicit handle closure - for example if you wish to delete the file, on Windows you must first close all open handles to that file.

search(starts, ends)

Search index database for MAF records overlapping ranges provided.

Returns MultipleSeqAlignment results in order by start, then end, then internal offset field.

starts should be a list of 0-based start coordinates of segments in the reference. ends should be the list of the corresponding segment ends (in the half-open UCSC convention: http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/).

get_spliced(starts, ends, strand=1)

Return a multiple alignment of the exact sequence range provided.

Accepts two lists of start and end positions on target_seqname, representing exons to be spliced in silico. Returns a MultipleSeqAlignment of the desired sequences spliced together.

starts should be a list of 0-based start coordinates of segments in the reference. ends should be the list of the corresponding segment ends (in the half-open UCSC convention: http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/).

To ask for the alignment portion corresponding to the first 100 nucleotides of the reference sequence, you would use search([0], [100])

__repr__()

Return a string representation of the index.

__len__()

Return the number of records in the index.