Bio.PopGen.GenePop.FileParser module

Code to parse BIG GenePop files.

The difference between this class and the standard Bio.PopGen.GenePop.Record class is that this one does not read the whole file to memory. It provides an iterator interface, slower but consuming much mess memory. Should be used with big files (Thousands of markers and individuals).

See http://wbiomed.curtin.edu.au/genepop/ , the format is documented here: http://wbiomed.curtin.edu.au/genepop/help_input.html .

Classes:
  • FileRecord Holds GenePop data.

Functions:

Bio.PopGen.GenePop.FileParser.read(fname)

Parse a file containing a GenePop file.

fname is a file name that contains a GenePop record.

class Bio.PopGen.GenePop.FileParser.FileRecord(fname)

Bases: object

Hold information from a GenePop record.

Attributes: - marker_len The marker length (2 or 3 digit code per allele). - comment_line Comment line. - loci_list List of loci names.

Methods: - get_individual Returns the next individual of the current population. - skip_population Skips the current population.

skip_population skips the individuals of the current population, returns True if there are more populations.

get_individual returns an individual of the current population (or None if the list ended).

Each individual is a pair composed by individual name and a list of alleles (2 per marker or 1 for haploid data). Examples:

('Ind1', [(1,2),    (3,3), (200,201)]
('Ind2', [(2,None), (3,3), (None,None)]
('Other1', [(1,1),  (4,3), (200,200)]
__init__(fname)

Initialize the class.

__str__()

Return (reconstructs) a GenePop textual representation.

This might take a lot of memory. Marker length will be 3.

start_read()

Start parsing a file containing a GenePop file.

skip_header()

Skip the Header. To be done after a re-open.

seek_position(pop, indiv)

Seek a certain position in the file.

Arguments:
  • pop - pop position (0 is first)

  • indiv - individual in pop

skip_population()

Skip the current population. Returns true if there is another pop.

get_individual()

Get the next individual.

Returns individual information if there are more individuals in the current population. Returns True if there are no more individuals in the current population, but there are more populations. Next read will be of the following pop. Returns False if at end of file.

remove_population(pos, fname)

Remove a population (by position).

Arguments:
  • pos - position

  • fname - file to be created with population removed

remove_locus_by_position(pos, fname)

Remove a locus by position.

Arguments:
  • pos - position

  • fname - file to be created with locus removed

remove_loci_by_position(positions, fname)

Remove a set of loci by position.

Arguments:
  • positions - positions

  • fname - file to be created with locus removed

remove_locus_by_name(name, fname)

Remove a locus by name.

Arguments:
  • name - name

  • fname - file to be created with locus removed

remove_loci_by_name(names, fname)

Remove a loci list (by name).

Arguments:
  • names - names

  • fname - file to be created with loci removed