Return-Path: X-Original-To: apmail-mahout-commits-archive@www.apache.org Delivered-To: apmail-mahout-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8AD976B92 for ; Fri, 15 Jul 2011 17:40:27 +0000 (UTC) Received: (qmail 61177 invoked by uid 500); 15 Jul 2011 17:40:27 -0000 Delivered-To: apmail-mahout-commits-archive@mahout.apache.org Received: (qmail 61147 invoked by uid 500); 15 Jul 2011 17:40:26 -0000 Mailing-List: contact commits-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list commits@mahout.apache.org Received: (qmail 61140 invoked by uid 99); 15 Jul 2011 17:40:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jul 2011 17:40:26 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jul 2011 17:40:21 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id p6FHe0KA006905 for ; Fri, 15 Jul 2011 17:40:01 GMT Date: Fri, 15 Jul 2011 13:40:00 -0400 (EDT) From: confluence@apache.org To: commits@mahout.apache.org Message-ID: <24314540.8085.1310751600854.JavaMail.confluence@thor> Subject: [CONF] Apache Mahout > Algorithms MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Auto-Submitted: auto-generated Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT) Page: Algorithms (https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms) Edited by Grant Ingersoll: --------------------------------------------------------------------- h2. Algorithms This section contains links to information, examples, use cases, etc. for the various algorithms we intend to implement. Click the individual links to learn more. The initial algorithms descriptions have been copied here from the original project proposal. The algorithms are grouped by the application setting, they can be used for. In case of multiple applications, the version presented in the paper was chosen, versions as implemented in our project will be added as soon as we are working on them. Original Paper: [Map Reduce for Machine Learning on Multicore|http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf] Papers related to Map Reduce: * [Evaluating MapReduce for Multi-core and Multiprocessor Systems|http://csl.stanford.edu/~christos/publications/2007.cmp_mapreduce.hpca.pdf] * [Map Reduce: Distributed Computing for Machine Learning|http://www.icsi.berkeley.edu/~arlo/publications/gillick_cs262a_proj.pdf] For Papers, videos and books related to machine learning in general, see [Machine Learning Resources] All algorithms are either marked as _integrated_, that is the implementation is integrated into the development version of Mahout. Algorithms that are currently being developed are annotated with a link to the JIRA issue that deals with the specific implementation. Usually these issues already contain patches that are more or less major, depending on how much work was spent on the issue so far. Algorithms that have so far not been touched are marked as _open_. [What, When, Where, Why (but not How or Who)] \- Community tips, tricks, etc. for when to use which algorithm in what situations, what to watch out for in terms of errors. That is, practical advice on using Mahout for your problems. h3. Classification A general introduction to the most common text classification algorithms can be found at Google Answers: [http://answers.google.com/answers/main?cmd=threadview&id=225316] For information on the algorithms implemented in Mahout (or scheduled for implementation) please visit the following pages. [Logistic Regression] (SGD) [Bayesian] [Support Vector Machines] (SVM) (open: [MAHOUT-14|http://issues.apache.org/jira/browse/MAHOUT-14], [MAHOUT-232|http://issues.apache.org/jira/browse/MAHOUT-232] and [MAHOUT-334|https://issues.apache.org/jira/browse/MAHOUT-334]) [Perceptron and Winnow] (open: [MAHOUT-85|http://issues.apache.org/jira/browse/MAHOUT-85]) [Neural Network] (open, but [MAHOUT-228|http://issues.apache.org/jira/browse/MAHOUT-228] might help) [Random Forests] (integrated - [MAHOUT-122|http://issues.apache.org/jira/browse/MAHOUT-122], [MAHOUT-140|http://issues.apache.org/jira/browse/MAHOUT-140], [MAHOUT-145|http://issues.apache.org/jira/browse/MAHOUT-145]) [Restricted Boltzmann Machines] (open, [MAHOUT-375|http://issues.apache.org/jira/browse/MAHOUT-375], GSOC2010) [Online Passive Aggressive] (awaiting patch commit, [MAHOUT-702|http://issues.apache.org/jira/browse/MAHOUT-702]) h3. Clustering [Reference Reading] [MAHOUT:Canopy Clustering] ([MAHOUT-3|https://issues.apache.org/jira/browse/MAHOUT-3] - integrated) [K-Means Clustering] ([MAHOUT-5|https://issues.apache.org/jira/browse/MAHOUT-5] - integrated) [Fuzzy K-Means] ([MAHOUT-74|https://issues.apache.org/jira/browse/MAHOUT-74] - integrated) [Expectation Maximization] (EM) ([MAHOUT-28|http://issues.apache.org/jira/browse/MAHOUT-28]) [Mean Shift Clustering] ([MAHOUT-15|https://issues.apache.org/jira/browse/MAHOUT-15] - integrated) [Hierarchical Clustering] ([MAHOUT-19|http://issues.apache.org/jira/browse/MAHOUT-19]) [Dirichlet Process Clustering] ([MAHOUT-30|http://issues.apache.org/jira/browse/MAHOUT-30] - integrated) [Latent Dirichlet Allocation] ([MAHOUT-123|http://issues.apache.org/jira/browse/MAHOUT-123] - integrated) [Spectral Clustering] ([MAHOUT-363|https://issues.apache.org/jira/browse/MAHOUT-363] - integrated) h3. Pattern Mining [Parallel FP Growth Algorithm|Parallel Frequent Pattern Mining] (Also known as Frequent Itemset mining) h3. Regression [Locally Weighted Linear Regression] (open) h3. Dimension reduction [Singular Value Decomposition and other Dimension Reduction Techniques|Dimensional Reduction] (available since 0.3) [Principal Components Analysis] (PCA) (open) [Independent Component Analysis] (open) [Gaussian Discriminative Analysis] (GDA) (open) h3. Evolutionary Algorithms see also: [MAHOUT-56 (integrated)|http://issues.apache.org/jira/browse/MAHOUT-56] You will find here information, examples, use cases, etc. related to Evolutionary Algorithms. Introductions and Tutorials: * [Evolutionary Algorithms Introduction|http://www.geatbx.com/docu/algindex.html] * [How to distribute the fitness evaluation using Mahout.GA|Mahout.GA.Tutorial] Examples: * [Traveling Salesman] * [Class Discovery] h3. Recommenders / Collaborative Filtering Mahout contains both simple non-distributed recommender implementations and distributed Hadoop-based recommenders. * [Non-distributed recommenders ("Taste")|Recommender Documentation] (integrated) * [Distributed recommenders (item-based)|Itembased Collaborative Filtering] (integrated) * [First-timer FAQ|Recommender First-Timer FAQ] h3. Vector Similarity Mahout contains implementations that allow one to compare one or more vectors with another set of vectors. This can be useful if one is, for instance, trying to calculate the pairwise similarity between all documents (or a subset of docs) in a corpus. * RowSimilarityJob -- Builds an inverted index and then computes distances between items that have co-occurrences. This is a fully distributed calculation. * VectorDistanceJob -- Does a map side join between a set of "seed" vectors and all of the input vectors. h3. Other * [Collocations] h3. Non-MapReduce algorithms Some algorithms and applications appeared on the mailing list, that have not been published in map reduce form so far. As we do not restrict ourselves to Hadoop-only versions, these proposals are listed here. [Hidden Markov Models] (HMM) (open) Change your notification preferences: https://cwiki.apache.org/confluence/users/viewnotifications.action