mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [CONF] Apache Lucene Mahout > Overview
Date Fri, 18 Jun 2010 10:13:01 GMT
Space: Apache Lucene Mahout (
Page: Overview (

Added by Isabel Drost:
h1. Overview of Mahout

Mahout's goal is to build scalable machine learning libraries. With scalable we mean: 
* Scalable to reasonably large data sets. Our core algorithms for clustering, classfication
and batch based collaborative filtering are implemented on top of Apache Hadoop using the
map/reduce paradigm. However we do not restrict contributions to Hadoop based implementations:
Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The
core libraries are highly optimized to allow for good performance also for non-distributed
* Scalable to support your business case. Mahout is distributed under a commercially friendly
Apache Software license.
* Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community
to facilitate discussions not only on the project itself but also on potential use cases.
Come to the mailing lists to find out more.

Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior
and from that tries to find items users might like. Clustering takes e.g. text documents and
groups them into groups of topically related documents. Classification learns from exisiting
categorized documents what documents of a specific category look like and is able to assign
unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a
set of item groups (terms in a query session, shopping cart content) and identifies, which
individual items usually appear together. 

Interested in helping? See the [Wiki|] or send us an email.
Also note, we are just getting off the ground, so please be patient as we get the various
infrastructure pieces in place.

Change your notification preferences:

View raw message