Package Bio :: Package AlignIO :: Module MauveIO
[hide private]
[frames] | no frames]

Module MauveIO

source code

Bio.AlignIO support for "xmfa" output from Mauve/ProgressiveMauve.

You are expected to use this module via the Bio.AlignIO functions (or the Bio.SeqIO functions if you want to work directly with the gapped sequences).

For example, consider a progressiveMauve alignment file containing the following:

#FormatVersion Mauve1
#Sequence1File      a.fa
#Sequence1Entry     1
#Sequence1Format    FastA
#Sequence2File      b.fa
#Sequence2Entry     2
#Sequence2Format    FastA
#Sequence3File      c.fa
#Sequence3Entry     3
#Sequence3Format    FastA
#BackboneFile       three.xmfa.bbcols
> 1:0-0 + a.fa
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
> 2:5417-5968 + b.fa
TTTAAACATCCCTCGGCCCGTCGCCCTTTTATAATAGCAGTACGTGAGAGGAGCGCCCTAAGCTTTGGGAAATTCAAGC-
--------------------------------------------------------------------------------
CTGGAACGTACTTGCTGGTTTCGCTACTATTTCAAACAAGTTAGAGGCCGTTACCTCGGGCGAACGTATAAACCATTCTG
> 3:9476-10076 - c.fa
TTTAAACACCTTTTTGGATG--GCCCAGTTCGTTCAGTTGTG-GGGAGGAGATCGCCCCAAACGTATGGTGAGTCGGGCG
TTTCCTATAGCTATAGGACCAATCCACTTACCATACGCCCGGCGTCGCCCAGTCCGGTTCGGTACCCTCCATGACCCACG
---------------------------------------------------------AAATGAGGGCCCAGGGTATGCTT
=
> 2:5969-6015 + b.fa
-----------------------
GGGCGAACGTATAAACCATTCTG
> 3:9429-9476 - c.fa
TTCGGTACCCTCCATGACCCACG
AAATGAGGGCCCAGGGTATGCTT

This is a multiple sequence alignment with multiple aligned sections, so you would probably load this using the Bio.AlignIO.parse() function:

>>> from Bio import AlignIO
>>> align = AlignIO.parse("Mauve/simple.xmfa", "mauve")
>>> alignments = list(align)
>>> for aln in alignments:
...     print(align)
SingleLetterAlphabet() alignment with 3 rows and 240 columns
--------------------------------------------...--- 1
TTTAAACATCCCTCGGCCCGTCGCCCTTTTATAATAGCAGTACG...CTG 2
TTTAAACACCTTTTTGGATG--GCCCAGTTCGTTCAGTTGTG-G...CTT 3
SingleLetterAlphabet() alignment with 3 rows and 46 columns
---------------------------------------------- 1
-----------------------GGGCGAACGTATAAACCATTCTG 2
TTCGGTACCCTCCATGACCCACGAAATGAGGGCCCAGGGTATGCTT 3

Additional information is extracted from the XMFA file and available through the annotation attribute of each record:

>>> for record in alignments[0]:
...   print record.id, len(record), record.annotations
1 240 {'start': 0, 'end': 0, 'strand': 1}
2 240 {'start': 5417, 'end': 5968, 'strand': 1}
3 240 {'start': 9476, 'end': 10076, 'strand': -1}
Classes [hide private]
  MauveWriter
Mauve/XMFA alignment writer.
  MauveIterator
Mauve xmfa alignment iterator.
Functions [hide private]
 
_identifier_split(identifier)
Return (name, start, end) string tuple from an identifier (PRIVATE).
source code
Variables [hide private]
  XMFA_HEADER_REGEX = re.compile(r'> (?P<id>\d+):(?P<start>\d+)-...
  XMFA_HEADER_REGEX_BIOPYTHON = re.compile(r'> (?P<id>\d+):(?P<s...
  ID_LINE_FMT = '> {seq_name}:{start}-{end} {strand} {file} # {u...
  __package__ = 'Bio.AlignIO'
Variables Details [hide private]

XMFA_HEADER_REGEX

Value:
re.compile(r'> (?P<id>\d+):(?P<start>\d+)-(?P<end>\d+) (?P<strand>[\+-\
]) (?P<name>.*)')

XMFA_HEADER_REGEX_BIOPYTHON

Value:
re.compile(r'> (?P<id>\d+):(?P<start>\d+)-(?P<end>\d+) (?P<strand>[\+-\
]) (?P<name>[^#]*) # (?P<realname>.*)')

ID_LINE_FMT

Value:
'''> {seq_name}:{start}-{end} {strand} {file} # {ugly_hack}
'''