Bio.MaxEntropy module
Maximum Entropy code.
Uses Improved Iterative Scaling.
- class Bio.MaxEntropy.MaxEntropy
Bases:
object
Hold information for a Maximum Entropy classifier.
Members: classes List of the possible classes of data. alphas List of the weights for each feature. feature_fns List of the feature functions.
Car data from example Naive Bayes Classifier example by Eric Meisner November 22, 2003 http://www.inf.u-szeged.hu/~ormandi/teaching
>>> from Bio.MaxEntropy import train, classify >>> xcar = [ ... ['Red', 'Sports', 'Domestic'], ... ['Red', 'Sports', 'Domestic'], ... ['Red', 'Sports', 'Domestic'], ... ['Yellow', 'Sports', 'Domestic'], ... ['Yellow', 'Sports', 'Imported'], ... ['Yellow', 'SUV', 'Imported'], ... ['Yellow', 'SUV', 'Imported'], ... ['Yellow', 'SUV', 'Domestic'], ... ['Red', 'SUV', 'Imported'], ... ['Red', 'Sports', 'Imported']] >>> ycar = ['Yes','No','Yes','No','Yes','No','Yes','No','No','Yes']
Requires some rules or features
>>> def udf1(ts, cl): ... return ts[0] != 'Red' ... >>> def udf2(ts, cl): ... return ts[1] != 'Sports' ... >>> def udf3(ts, cl): ... return ts[2] != 'Domestic' ... >>> user_functions = [udf1, udf2, udf3] # must be an iterable type >>> xe = train(xcar, ycar, user_functions) >>> for xv, yv in zip(xcar, ycar): ... xc = classify(xe, xv) ... print('Pred: %s gives %s y is %s' % (xv, xc, yv)) ... Pred: ['Red', 'Sports', 'Domestic'] gives No y is Yes Pred: ['Red', 'Sports', 'Domestic'] gives No y is No Pred: ['Red', 'Sports', 'Domestic'] gives No y is Yes Pred: ['Yellow', 'Sports', 'Domestic'] gives No y is No Pred: ['Yellow', 'Sports', 'Imported'] gives No y is Yes Pred: ['Yellow', 'SUV', 'Imported'] gives No y is No Pred: ['Yellow', 'SUV', 'Imported'] gives No y is Yes Pred: ['Yellow', 'SUV', 'Domestic'] gives No y is No Pred: ['Red', 'SUV', 'Imported'] gives No y is No Pred: ['Red', 'Sports', 'Imported'] gives No y is Yes
- __init__()
Initialize the class.
- Bio.MaxEntropy.calculate(me, observation)
Calculate the log of the probability for each class.
me is a MaxEntropy object that has been trained. observation is a vector representing the observed data. The return value is a list of unnormalized log probabilities for each class.
- Bio.MaxEntropy.classify(me, observation)
Classify an observation into a class.
- Bio.MaxEntropy.train(training_set, results, feature_fns, update_fn=None, max_iis_iterations=10000, iis_converge=1.0e-5, max_newton_iterations=100, newton_converge=1.0e-10)
Train a maximum entropy classifier, returns MaxEntropy object.
Train a maximum entropy classifier on a training set. training_set is a list of observations. results is a list of the class assignments for each observation. feature_fns is a list of the features. These are callback functions that take an observation and class and return a 1 or 0. update_fn is a callback function that is called at each training iteration. It is passed a MaxEntropy object that encapsulates the current state of the training.
The maximum number of iterations and the convergence criterion for IIS are given by max_iis_iterations and iis_converge, respectively, while max_newton_iterations and newton_converge are the maximum number of iterations and the convergence criterion for Newton’s method.