mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [CONF] Apache Lucene Mahout: Perceptron and Winnow (page created)
Date Mon, 03 Nov 2008 16:37:00 GMT
Perceptron and Winnow (MAHOUT) created by Isabel Drost


h1. Classification with Perceptron or Winnow

Both algorithms can are comparably simple linear classifiers. Given training data in some
n-dimensional vector space that is annotated with binary labels the algorithms are guaranteed
to find a linear separating hyperplane if there exists one. In contrast to the Perceptron,
Winnow works only for binary feature vectors.

For more information on the Perceptron see for instance:

Concise course notes on both algorithms:

Although the algorithms are comparably simple they still work pretty good for text classification
and are fast to train even for huge example sets. In contrast to Naive Bayes they are not
based on the assumption that all features (in the domain of text classification: all terms
in a document) are independent.

h2. Strategy for parallelisation

Currently the strategy for parallelisation is simple: Given there is enough training data,
split the training data. Train the classifier on each split. The resulting hyperplanes are
than averaged.

h2. Roadmap

Currently the patch only contains the code for the classifier itself. It is planned to provide
unit tests and at least one example based on the WebKB dataset by the end of November for
the serial version. After that the parallelisation will be added.

This message is automatically generated by Confluence

Unsubscribe or edit your notifications preferences

If you think it was sent incorrectly contact one of the administrators

If you want more information on Confluence, or have a bug to report see

View raw message