Bio.MaxEntropy module

Maximum Entropy code.

Uses Improved Iterative Scaling.

class Bio.MaxEntropy.MaxEntropy

Bases: object

Hold information for a Maximum Entropy classifier.

Members: classes List of the possible classes of data. alphas List of the weights for each feature. feature_fns List of the feature functions.

Car data from example Naive Bayes Classifier example by Eric Meisner November 22, 2003 http://www.inf.u-szeged.hu/~ormandi/teaching

>>> from Bio.MaxEntropy import train, classify
>>> xcar = [
...     ['Red', 'Sports', 'Domestic'],
...     ['Red', 'Sports', 'Domestic'],
...     ['Red', 'Sports', 'Domestic'],
...     ['Yellow', 'Sports', 'Domestic'],
...     ['Yellow', 'Sports', 'Imported'],
...     ['Yellow', 'SUV', 'Imported'],
...     ['Yellow', 'SUV', 'Imported'],
...     ['Yellow', 'SUV', 'Domestic'],
...     ['Red', 'SUV', 'Imported'],
...     ['Red', 'Sports', 'Imported']]
>>> ycar = ['Yes','No','Yes','No','Yes','No','Yes','No','No','Yes']

Requires some rules or features

>>> def udf1(ts, cl):
...     return ts[0] != 'Red'
...
>>> def udf2(ts, cl):
...     return ts[1] != 'Sports'
...
>>> def udf3(ts, cl):
...     return ts[2] != 'Domestic'
...
>>> user_functions = [udf1, udf2, udf3]  # must be an iterable type
>>> xe = train(xcar, ycar, user_functions)
>>> for xv, yv in zip(xcar, ycar):
...     xc = classify(xe, xv)
...     print('Pred: %s gives %s y is %s' % (xv, xc, yv))
...
Pred: ['Red', 'Sports', 'Domestic'] gives No y is Yes
Pred: ['Red', 'Sports', 'Domestic'] gives No y is No
Pred: ['Red', 'Sports', 'Domestic'] gives No y is Yes
Pred: ['Yellow', 'Sports', 'Domestic'] gives No y is No
Pred: ['Yellow', 'Sports', 'Imported'] gives No y is Yes
Pred: ['Yellow', 'SUV', 'Imported'] gives No y is No
Pred: ['Yellow', 'SUV', 'Imported'] gives No y is Yes
Pred: ['Yellow', 'SUV', 'Domestic'] gives No y is No
Pred: ['Red', 'SUV', 'Imported'] gives No y is No
Pred: ['Red', 'Sports', 'Imported'] gives No y is Yes
__init__(self)

Initialize the class.

Bio.MaxEntropy.calculate(me, observation)

Calculate the log of the probability for each class.

me is a MaxEntropy object that has been trained. observation is a vector representing the observed data. The return value is a list of unnormalized log probabilities for each class.

Bio.MaxEntropy.classify(me, observation)

Classify an observation into a class.

Bio.MaxEntropy.train(training_set, results, feature_fns, update_fn=None, max_iis_iterations=10000, iis_converge=1e-05, max_newton_iterations=100, newton_converge=1e-10)

Train a maximum entropy classifier, returns MaxEntropy object.

Train a maximum entropy classifier on a training set. training_set is a list of observations. results is a list of the class assignments for each observation. feature_fns is a list of the features. These are callback functions that take an observation and class and return a 1 or 0. update_fn is a callback function that is called at each training iteration. It is passed a MaxEntropy object that encapsulates the current state of the training.

The maximum number of iterations and the convergence criterion for IIS are given by max_iis_iterations and iis_converge, respectively, while max_newton_iterations and newton_converge are the maximum number of iterations and the convergence criterion for Newton’s method.