mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy Lyubimov (Confluence)" <conflue...@apache.org>
Subject [CONF] Apache Mahout > Algorithms
Date Wed, 09 Oct 2013 18:21:00 GMT
<html>
<head>
    <base href="https://cwiki.apache.org/confluence">
            <link rel="stylesheet" href="/confluence/s/en/2176/1/186/_/styles/combined.css?spaceKey=MAHOUT&amp;forWysiwyg=true"
type="text/css">
    </head>
<body style="background: white;" bgcolor="white" class="email-body">
<div id="pageContent">
<div id="notificationFormat">
<div class="wiki-content">
<div class="email">
    <h2><a href="https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms">Algorithms</a></h2>
    <h4>Page <b>edited</b> by             <a href="https://cwiki.apache.org/confluence/display/~dlyubimov">Dmitriy
Lyubimov</a>
    </h4>
        <br/>
                         <h4>Changes (1)</h4>
                                 
    
<div id="page-diffs">
                    <table class="diff" cellpadding="0" cellspacing="0">
    
            <tr><td class="diff-snipped" >...<br></td></tr>
            <tr><td class="diff-unchanged" >[Singular Value Decomposition and
other Dimension Reduction Techniques|Dimensional Reduction] (available since 0.3) <br>
<br></td></tr>
            <tr><td class="diff-changed-lines" >[Stochastic Singular Value Decomposition
with PCA workflow|Stochastic Singular Value Decomposition] (PCA <span class="diff-added-words"style="background-color:
#dfd;">and dimensionality reduction</span> workflow <span class="diff-added-words"style="background-color:
#dfd;">is</span> now <span class="diff-changed-words">integrated<span class="diff-added-chars"style="background-color:
#dfd;"> with SSVD</span>)</span> <br></td></tr>
            <tr><td class="diff-unchanged" > <br>[Principal Components Analysis]
(PCA) (open) <br></td></tr>
            <tr><td class="diff-snipped" >...<br></td></tr>
    
            </table>
    </div>                            <h4>Full Content</h4>
                    <div class="notificationGreySide">
        <h2><a name="Algorithms-Algorithms"></a>Algorithms</h2>

<p>This section contains links to information, examples, use cases, etc. for the various
algorithms we intend to implement.  Click the individual links to learn more. The initial
algorithms descriptions have been copied here from the original project proposal. The algorithms
are grouped by the application setting, they can be used for. In case of multiple applications,
the version presented in the paper was chosen, versions as implemented in our project will
be added as soon as we are working on them.</p>

<p>Original Paper: <a href="http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf"
class="external-link" rel="nofollow">Map Reduce for Machine Learning on Multicore</a></p>

<p>Papers related to Map Reduce:</p>
<ul>
	<li><a href="http://csl.stanford.edu/~christos/publications/2007.cmp_mapreduce.hpca.pdf"
class="external-link" rel="nofollow">Evaluating MapReduce for Multi-core and Multiprocessor
Systems</a></li>
	<li><a href="http://www.icsi.berkeley.edu/~arlo/publications/gillick_cs262a_proj.pdf"
class="external-link" rel="nofollow">Map Reduce: Distributed Computing for Machine Learning</a></li>
</ul>


<p>For Papers, videos and books related to machine learning in general, see <a href="/confluence/display/MAHOUT/Machine+Learning+Resources"
title="Machine Learning Resources">Machine Learning Resources</a></p>

<p>All algorithms are either marked as <em>integrated</em>, that is the
implementation is integrated into the development version of Mahout. Algorithms that are currently
being developed are annotated with a link to the JIRA issue that deals with the specific implementation.
Usually these issues already contain patches that are more or less major, depending on how
much work was spent on the issue so far. Algorithms that have so far not been touched are
marked as <em>open</em>.</p>

<p><a href="/confluence/display/MAHOUT/What%2C+When%2C+Where%2C+Why+%28but+not+How+or+Who%29"
title="What, When, Where, Why (but not How or Who)">What, When, Where, Why &#40;but
not How or Who&#41;</a> &#45; Community tips, tricks, etc. for when to use which
algorithm in what situations, what to watch out for in terms of errors.  That is, practical
advice on using Mahout for your problems.</p>

<h3><a name="Algorithms-Classification"></a>Classification</h3>

<p>A general introduction to the most common text classification algorithms can be found
at Google Answers: <a href="http://answers.google.com/answers/main?cmd=threadview&amp;id=225316"
class="external-link" rel="nofollow">http://answers.google.com/answers/main?cmd=threadview&amp;id=225316</a>
For information on the algorithms implemented in Mahout (or scheduled for implementation)
please visit the following pages.</p>

<p><a href="/confluence/display/MAHOUT/Logistic+Regression" title="Logistic Regression">Logistic
Regression</a> (SGD)</p>

<p><a href="/confluence/display/MAHOUT/Bayesian" title="Bayesian">Bayesian</a></p>

<p><a href="/confluence/display/MAHOUT/Support+Vector+Machines" title="Support Vector
Machines">Support Vector Machines</a> (SVM) (open: <a href="http://issues.apache.org/jira/browse/MAHOUT-14"
class="external-link" rel="nofollow">MAHOUT-14</a>, <a href="http://issues.apache.org/jira/browse/MAHOUT-232"
class="external-link" rel="nofollow">MAHOUT-232</a> and <a href="https://issues.apache.org/jira/browse/MAHOUT-334"
class="external-link" rel="nofollow">MAHOUT-334</a>) </p>

<p><a href="/confluence/display/MAHOUT/Perceptron+and+Winnow" title="Perceptron and
Winnow">Perceptron and Winnow</a> (open: <a href="http://issues.apache.org/jira/browse/MAHOUT-85"
class="external-link" rel="nofollow">MAHOUT-85</a>)</p>

<p><a href="/confluence/display/MAHOUT/Neural+Network" title="Neural Network">Neural
Network</a> (open, but <a href="http://issues.apache.org/jira/browse/MAHOUT-228"
class="external-link" rel="nofollow">MAHOUT-228</a> might help)</p>

<p><a href="/confluence/display/MAHOUT/Random+Forests" title="Random Forests">Random
Forests</a> (integrated - <a href="http://issues.apache.org/jira/browse/MAHOUT-122"
class="external-link" rel="nofollow">MAHOUT-122</a>, <a href="http://issues.apache.org/jira/browse/MAHOUT-140"
class="external-link" rel="nofollow">MAHOUT-140</a>, <a href="http://issues.apache.org/jira/browse/MAHOUT-145"
class="external-link" rel="nofollow">MAHOUT-145</a>)</p>

<p><a href="/confluence/display/MAHOUT/Restricted+Boltzmann+Machines" title="Restricted
Boltzmann Machines">Restricted Boltzmann Machines</a> (open, <a href="http://issues.apache.org/jira/browse/MAHOUT-375"
class="external-link" rel="nofollow">MAHOUT-375</a>, GSOC2010)</p>

<p><a href="/confluence/display/MAHOUT/Online+Passive+Aggressive" title="Online Passive
Aggressive">Online Passive Aggressive</a> (integrated, <a href="http://issues.apache.org/jira/browse/MAHOUT-702"
class="external-link" rel="nofollow">MAHOUT-702</a>)</p>

<p><a href="/confluence/display/MAHOUT/Boosting" title="Boosting">Boosting</a>
(awaiting patch commit, <a href="https://issues.apache.org/jira/browse/MAHOUT-716" class="external-link"
rel="nofollow">MAHOUT-716</a>)</p>

<p><a href="/confluence/display/MAHOUT/Hidden+Markov+Models" title="Hidden Markov
Models">Hidden Markov Models</a> (HMM) (MAHOUT-627, MAHOUT-396, MAHOUT-734) - Training
is done in Map-Reduce</p>

<h3><a name="Algorithms-Clustering"></a>Clustering</h3>

<p><a href="/confluence/display/MAHOUT/Reference+Reading" title="Reference Reading">Reference
Reading</a></p>

<p><a href="/confluence/display/MAHOUT/Canopy+Clustering" title="Canopy Clustering">Canopy
Clustering</a> (<a href="https://issues.apache.org/jira/browse/MAHOUT-3" class="external-link"
rel="nofollow">MAHOUT-3</a> - integrated)</p>

<p><a href="/confluence/display/MAHOUT/K-Means+Clustering" title="K-Means Clustering">K&#45;Means
Clustering</a> (<a href="https://issues.apache.org/jira/browse/MAHOUT-5" class="external-link"
rel="nofollow">MAHOUT-5</a> - integrated)</p>

<p><a href="/confluence/display/MAHOUT/Fuzzy+K-Means" title="Fuzzy K-Means">Fuzzy
K&#45;Means</a> (<a href="https://issues.apache.org/jira/browse/MAHOUT-74" class="external-link"
rel="nofollow">MAHOUT-74</a> - integrated)</p>

<p><a href="/confluence/display/MAHOUT/Expectation+Maximization" title="Expectation
Maximization">Expectation Maximization</a> (EM) (<a href="http://issues.apache.org/jira/browse/MAHOUT-28"
class="external-link" rel="nofollow">MAHOUT-28</a>)</p>

<p><a href="/confluence/display/MAHOUT/Mean+Shift+Clustering" title="Mean Shift Clustering">Mean
Shift Clustering</a> (<a href="https://issues.apache.org/jira/browse/MAHOUT-15" class="external-link"
rel="nofollow">MAHOUT-15</a> - integrated)</p>

<p><a href="/confluence/display/MAHOUT/Hierarchical+Clustering" title="Hierarchical
Clustering">Hierarchical Clustering</a> (<a href="http://issues.apache.org/jira/browse/MAHOUT-19"
class="external-link" rel="nofollow">MAHOUT-19</a>)</p>

<p><a href="/confluence/display/MAHOUT/Dirichlet+Process+Clustering" title="Dirichlet
Process Clustering">Dirichlet Process Clustering</a> (<a href="http://issues.apache.org/jira/browse/MAHOUT-30"
class="external-link" rel="nofollow">MAHOUT-30</a> - integrated)</p>

<p><a href="/confluence/display/MAHOUT/Latent+Dirichlet+Allocation" title="Latent
Dirichlet Allocation">Latent Dirichlet Allocation</a> (<a href="http://issues.apache.org/jira/browse/MAHOUT-123"
class="external-link" rel="nofollow">MAHOUT-123</a> - integrated)</p>

<p><a href="/confluence/display/MAHOUT/Spectral+Clustering" title="Spectral Clustering">Spectral
Clustering</a> (<a href="https://issues.apache.org/jira/browse/MAHOUT-363" class="external-link"
rel="nofollow">MAHOUT-363</a> - integrated)</p>

<p><a href="/confluence/display/MAHOUT/Minhash+Clustering" title="Minhash Clustering">Minhash
Clustering</a> (<a href="https://issues.apache.org/jira/browse/MAHOUT-344" class="external-link"
rel="nofollow">MAHOUT-344</a> - integrated)</p>

<p><a href="/confluence/display/MAHOUT/Top+Down+Clustering" title="Top Down Clustering">Top
Down Clustering</a> (<a href="https://issues.apache.org/jira/browse/MAHOUT-843" class="external-link"
rel="nofollow">MAHOUT-843</a> - integrated)</p>

<h3><a name="Algorithms-PatternMining"></a>Pattern Mining</h3>

<p><a href="/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining" title="Parallel
Frequent Pattern Mining">Parallel FP Growth Algorithm</a> (Also known as Frequent
Itemset mining)</p>

<h3><a name="Algorithms-Regression"></a>Regression</h3>

<p><a href="/confluence/display/MAHOUT/Locally+Weighted+Linear+Regression" title="Locally
Weighted Linear Regression">Locally Weighted Linear Regression</a> (open)</p>


<h3><a name="Algorithms-Dimensionreduction"></a>Dimension reduction</h3>

<p><a href="/confluence/display/MAHOUT/Dimensional+Reduction" title="Dimensional
Reduction">Singular Value Decomposition and other Dimension Reduction Techniques</a>
(available since 0.3)</p>

<p><a href="/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition" title="Stochastic
Singular Value Decomposition">Stochastic Singular Value Decomposition with PCA workflow</a>
(PCA and dimensionality reduction workflow is now integrated with SSVD)</p>

<p><a href="/confluence/display/MAHOUT/Principal+Components+Analysis" title="Principal
Components Analysis">Principal Components Analysis</a> (PCA) (open)</p>

<p><a href="/confluence/display/MAHOUT/Independent+Component+Analysis" title="Independent
Component Analysis">Independent Component Analysis</a> (open)</p>

<p><a href="/confluence/display/MAHOUT/Gaussian+Discriminative+Analysis" title="Gaussian
Discriminative Analysis">Gaussian Discriminative Analysis</a> (GDA) (open)</p>

<h3><a name="Algorithms-EvolutionaryAlgorithms"></a>Evolutionary Algorithms</h3>

<ul>
	<li>NOTE: * Watchmaker support has been removed as of 0.7</li>
</ul>


<p>see also: <a href="http://issues.apache.org/jira/browse/MAHOUT-56" class="external-link"
rel="nofollow">MAHOUT-56 (integrated)</a></p>

<p>You will find here information, examples, use cases, etc. related to Evolutionary
Algorithms.</p>

<p>Introductions and Tutorials:</p>
<ul>
	<li><a href="http://www.geatbx.com/docu/algindex.html" class="external-link" rel="nofollow">Evolutionary
Algorithms Introduction</a></li>
	<li><a href="/confluence/display/MAHOUT/Mahout.GA.Tutorial" title="Mahout.GA.Tutorial">How
to distribute the fitness evaluation using Mahout.GA</a></li>
</ul>


<p>Examples:</p>
<ul>
	<li><a href="/confluence/display/MAHOUT/Traveling+Salesman" title="Traveling Salesman">Traveling
Salesman</a></li>
	<li><a href="/confluence/display/MAHOUT/Class+Discovery" title="Class Discovery">Class
Discovery</a></li>
</ul>


<h3><a name="Algorithms-Recommenders%2FCollaborativeFiltering"></a>Recommenders
/ Collaborative Filtering</h3>

<p>Mahout contains both simple non-distributed recommender implementations and distributed
Hadoop-based recommenders.</p>

<ul>
	<li><a href="/confluence/display/MAHOUT/Recommender+Documentation" title="Recommender
Documentation">Non-distributed recommenders ("Taste")</a> (integrated)</li>
	<li><a href="/confluence/display/MAHOUT/Itembased+Collaborative+Filtering" title="Itembased
Collaborative Filtering">Distributed Item-Based Collaborative Filtering</a> (integrated)</li>
	<li><a href="/confluence/display/MAHOUT/Collaborative+Filtering+with+ALS-WR" title="Collaborative
Filtering with ALS-WR">Collaborative Filtering using a parallel matrix factorization</a>
(integrated)</li>
	<li><a href="/confluence/display/MAHOUT/Recommender+First-Timer+FAQ" title="Recommender
First-Timer FAQ">First-timer FAQ</a></li>
</ul>


<h3><a name="Algorithms-VectorSimilarity"></a>Vector Similarity</h3>

<p>Mahout contains implementations that allow one to compare one or more vectors with
another set of vectors.  This can be useful if one is, for instance, trying to calculate the
pairwise similarity between all documents (or a subset of docs) in a corpus.</p>

<ul>
	<li>RowSimilarityJob &#8211; Builds an inverted index and then computes distances
between items that have co-occurrences.  This is a fully distributed calculation.</li>
	<li>VectorDistanceJob &#8211; Does a map side join between a set of "seed" vectors
and all of the input vectors.</li>
</ul>


<h3><a name="Algorithms-Other"></a>Other</h3>

<ul>
	<li><a href="/confluence/display/MAHOUT/Collocations" title="Collocations">Collocations</a></li>
</ul>


<h3><a name="Algorithms-NonMapReducealgorithms"></a>Non-MapReduce algorithms</h3>

<p>Some algorithms and applications appeared on the mailing list, that have not been
published in map reduce form so far. As we do not restrict ourselves to Hadoop-only versions,
these proposals are listed here.</p>



    </div>
        <div id="commentsSection" class="wiki-content pageSection">
        <div style="float: right;" class="grey">
                        <a href="https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=MAHOUT">Stop
watching space</a>
            <span style="padding: 0px 5px;">|</span>
                <a href="https://cwiki.apache.org/confluence/users/editmyemailsettings.action">Change
email notification preferences</a>
</div>
        <a href="https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms">View
Online</a>
        |
        <a href="https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=119837&revisedVersion=47&originalVersion=46">View
Changes</a>
            </div>
</div>
</div>
</div>
</div>
</body>
</html>

Mime
View raw message