mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [CONF] Apache Lucene Mahout > Algorithms
Date Thu, 31 Dec 2009 16:42:00 GMT
Space: Apache Lucene Mahout (
Page: Algorithms (

Edited by Grant Ingersoll:
h2. Algorithms

This section contains links to information, examples, use cases, etc. for the various algorithms
we intend to implement.  Click the individual links to learn more. The initial algorithms
descriptions have been copied here from the original project proposal. The algorithms are
grouped by the application setting, they can be used for. In case of multiple applications,
the version presented in the paper was chosen, versions as implemented in our project will
be added as soon as we are working on them.

Original Paper: [Map Reduce for Machine Learning on Multicore|]

Papers related to Map Reduce:
* [Evaluating MapReduce for Multi-core and Multiprocessor Systems|]
* [Map Reduce: Distributed Computing for Machine Learning|]

For Papers, videos and books related to machine learning in general, see [Machine Learning

All algorithms are either marked as _integrated_, that is the implementation is integrated
into the development version of Mahout. Algorithms that are currently being developed are
annotated with a link to the JIRA issue that deals with the specific implementation. Usually
these issues already contain patches that are more or less major, depending on how much work
was spent on the issue so far. Algorithms that have so far not been touched are marked as

[What, When, Where, Why (but not How or Who)] \- Community tips, tricks, etc. for when to
use which algorithm in what situations, what to watch out for in terms of errors.  That is,
practical advice on using Mahout for your problems.

h3. Classification

A general introduction to the most common text classification algorithms can be found at Google
Answers: [] For information
on the algorithms implemented in Mahout (or scheduled for implementation) please visit the
following pages.

[Logistic Regression] (open)


[Support Vector Machines] (SVM) (open: [MAHOUT-14|])

[Perceptron and Winnow] (open: [MAHOUT-85|]

[Neural Network] (open)

[Random Forests] (open)

h3. Clustering

[Reference Reading]

[Canopy Clustering] (integrated)

[k-Means] (integrated)

[Fuzzy K-Means] ([MAHOUT-74|]) (integrated)

[Expectation Maximization] (EM) ([MAHOUT-28|])

[Mean Shift] (integrated)

[Hierarchical Clustering] ([MAHOUT-19|])

[Dirichlet Process Clustering] ([MAHOUT-30|]
- integrated)
[Latent Dirichlet Allocation] ([MAHOUT-123|]
- integrated)

h3. Regression

[Locally Weighted Linear Regression] (open)

h3. Dimension reduction

[Principal Components Analysis] (PCA) (open)

[Independent Component Analysis] (open)

[Gaussian Discriminative Analysis] (GDA) (open)

h3. Evolutionary Algorithms

see also: [MAHOUT-56 (integrated)|]

You will find here information, examples, use cases, etc. related to Evolutionary Algorithms.

Introductions and Tutorials:
* [Evolutionary Algorithms Introduction|]
* [How to distribute the fitness evaluation using Mahout.GA|Mahout.GA.Tutorial]

* [Traveling Salesman]
* [Class Discovery]

h3. Non map reduce algorithms

Some algorithms and applications appeared on the mailing list, that have not been published
in map reduce form so far. As we do not restrict ourselves to hadoop-only versions, these
proposals are listed here.

[Hidden Markov Models] (HMM) (open)

[Recommendation Learning] (integrated)

Change your notification preferences:

View raw message