Bio.kNN module

Code for doing k-nearest-neighbors classification.

k Nearest Neighbors is a supervised learning algorithm that classifies a new observation based the classes in its surrounding neighborhood.

Glossary:
  • distance The distance between two points in the feature space.

  • weight The importance given to each point for classification.

Classes:
  • kNN Holds information for a nearest neighbors classifier.

Functions:
  • train Train a new kNN classifier.

  • calculate Calculate the probabilities of each class, given an observation.

  • classify Classify an observation into a class.

Weighting Functions:
  • equal_weight Every example is given a weight of 1.

class Bio.kNN.kNN

Bases: object

Holds information necessary to do nearest neighbors classification.

Attribues:
  • classes Set of the possible classes.

  • xs List of the neighbors.

  • ys List of the classes that the neighbors belong to.

  • k Number of neighbors to look at.

__init__(self)

Initialize.

Bio.kNN.equal_weight(x, y)

Return integer one (dummy method for equally weighting).

Bio.kNN.train(xs, ys, k, typecode=None)

Train a k nearest neighbors classifier on a training set.

xs is a list of observations and ys is a list of the class assignments. Thus, xs and ys should contain the same number of elements. k is the number of neighbors that should be examined when doing the classification.

Bio.kNN.calculate(knn, x, weight_fn=None, distance_fn=None)

Calculate the probability for each class.

Arguments:
  • x is the observed data.

  • weight_fn is an optional function that takes x and a training example, and returns a weight.

  • distance_fn is an optional function that takes two points and returns the distance between them. If distance_fn is None (the default), the Euclidean distance is used.

Returns a dictionary of the class to the weight given to the class.

Bio.kNN.classify(knn, x, weight_fn=None, distance_fn=None)

Classify an observation into a class.

If not specified, weight_fn will give all neighbors equal weight. distance_fn is an optional function that takes two points and returns the distance between them. If distance_fn is None (the default), the Euclidean distance is used.