Bio.kNN module¶
Code for doing k-nearest-neighbors classification.
k Nearest Neighbors is a supervised learning algorithm that classifies a new observation based the classes in its surrounding neighborhood.
- Glossary:
distance The distance between two points in the feature space.
weight The importance given to each point for classification.
- Classes:
kNN Holds information for a nearest neighbors classifier.
- Functions:
train Train a new kNN classifier.
calculate Calculate the probabilities of each class, given an observation.
classify Classify an observation into a class.
- Weighting Functions:
equal_weight Every example is given a weight of 1.
-
class
Bio.kNN.
kNN
¶ Bases:
object
Holds information necessary to do nearest neighbors classification.
- Attribues:
classes Set of the possible classes.
xs List of the neighbors.
ys List of the classes that the neighbors belong to.
k Number of neighbors to look at.
-
__init__
(self)¶ Initialize.
-
Bio.kNN.
equal_weight
(x, y)¶ Return integer one (dummy method for equally weighting).
-
Bio.kNN.
train
(xs, ys, k, typecode=None)¶ Train a k nearest neighbors classifier on a training set.
xs is a list of observations and ys is a list of the class assignments. Thus, xs and ys should contain the same number of elements. k is the number of neighbors that should be examined when doing the classification.
-
Bio.kNN.
calculate
(knn, x, weight_fn=None, distance_fn=None)¶ Calculate the probability for each class.
- Arguments:
x is the observed data.
weight_fn is an optional function that takes x and a training example, and returns a weight.
distance_fn is an optional function that takes two points and returns the distance between them. If distance_fn is None (the default), the Euclidean distance is used.
Returns a dictionary of the class to the weight given to the class.
-
Bio.kNN.
classify
(knn, x, weight_fn=None, distance_fn=None)¶ Classify an observation into a class.
If not specified, weight_fn will give all neighbors equal weight. distance_fn is an optional function that takes two points and returns the distance between them. If distance_fn is None (the default), the Euclidean distance is used.