mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r944380 [21/24] - in /websites/staging/mahout/trunk/content: ./ developers/ general/ users/basics/ users/classification/ users/clustering/ users/dim-reduction/ users/mapreduce/ users/mapreduce/classification/ users/mapreduce/clustering/ use...
Date Thu, 19 Mar 2015 21:21:47 GMT
Added: websites/staging/mahout/trunk/content/users/mapreduce/recommender/intro-itembased-hadoop.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/mapreduce/recommender/intro-itembased-hadoop.html (added)
+++ websites/staging/mahout/trunk/content/users/mapreduce/recommender/intro-itembased-hadoop.html Thu Mar 19 21:21:45 2015
@@ -0,0 +1,311 @@
+<!DOCTYPE html>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <title>Apache Mahout: Scalable machine learning and data mining</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="Distribution" content="Global">
+  <meta name="Robots" content="index,follow">
+  <meta name="keywords" content="apache, apache hadoop, apache lucene,
+        business data mining, cluster analysis,
+        collaborative filtering, data extraction, data filtering, data framework, data integration,
+        data matching, data mining, data mining algorithms, data mining analysis, data mining data,
+        data mining introduction, data mining software,
+        data mining techniques, data representation, data set, datamining,
+        feature extraction, fuzzy k means, genetic algorithm, hadoop,
+        hierarchical clustering, high dimensional, introduction to data mining, kmeans,
+        knowledge discovery, learning approach, learning approaches, learning methods,
+        learning techniques, lucene, machine learning, machine translation, mahout apache,
+        mahout taste, map reduce hadoop, mining data, mining methods, naive bayes,
+        natural language processing,
+        supervised, text mining, time series data, unsupervised, web data mining">
+  <link rel="shortcut icon" type="image/x-icon" href="http://mahout.apache.org/images/favicon.ico">
+  <script type="text/javascript" src="/js/prototype.js"></script>
+  <script type="text/javascript" src="/js/effects.js"></script>
+  <script type="text/javascript" src="/js/search.js"></script>
+  <script type="text/javascript" src="/js/slides.js"></script>
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-responsive.css" rel="stylesheet">
+  <link rel="stylesheet" href="/css/global.css" type="text/css">
+
+  <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown -->
+  <script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    tex2jax: {
+      skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+    }
+  });
+  MathJax.Hub.Queue(function() {
+    var all = MathJax.Hub.getAllJax(), i;
+    for(i = 0; i < all.length; i += 1) {
+      all[i].SourceElement().parentNode.className += ' has-jax';
+    }
+  });
+  </script>
+  <script type="text/javascript">
+    var mathjax = document.createElement('script'); 
+    mathjax.type = 'text/javascript'; 
+    mathjax.async = true;
+
+    mathjax.src = ('https:' == document.location.protocol) ?
+        'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : 
+        'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+	
+	  var s = document.getElementsByTagName('script')[0]; 
+    s.parentNode.insertBefore(mathjax, s);
+  </script>
+</head>
+
+<body id="home" data-twttr-rendered="true">
+  <div id="wrap">
+   <div id="header">
+    <div id="logo"><a href="/overview.html"></a></div>
+  <div id="search">
+    <form id="search-form" action="http://www.google.com/search" method="get" class="navbar-search pull-right">    
+      <input value="http://mahout.apache.org" name="sitesearch" type="hidden">
+      <input class="search-query" name="q" id="query" type="text">
+      <input id="submission" type="image" src="/images/mahout-lupe.png" alt="Search" />
+    </form>
+  </div>
+
+    <div class="navbar navbar-inverse" style="position:absolute;top:133px;padding-right:0px;padding-left:0px;">
+      <div class="navbar-inner" style="border: none; background: #999; border: none; border-radius: 0px;">
+        <div class="container">
+          <button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <!-- <a class="brand" href="#">Apache Community Development Project</a> -->
+          <div class="nav-collapse collapse">
+            <ul class="nav">
+              <li><a href="/">Home</a></li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">General<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/general/downloads.html">Downloads</a>
+                  <li><a href="/general/who-we-are.html">Who we are</a>
+                  <li><a href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a>
+                  <li><a href="/general/release-notes.html">Release Notes</a> 
+                  <li><a href="/general/books-tutorials-and-talks.html">Books, Tutorials, Talks</a></li>
+                  <li><a href="/general/powered-by-mahout.html">Powered By Mahout</a>
+                  <li><a href="/general/professional-support.html">Professional Support</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Resources</li>
+                  <li><a href="/general/reference-reading.html">Reference Reading</a>
+                  <li><a href="/general/faq.html">FAQ</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Legal</li>
+                  <li><a href="http://www.apache.org/licenses/">License</a></li>
+                  <li><a href="http://www.apache.org/security/">Security</a></li>
+                  <li><a href="/general/privacy-policy.html">Privacy Policy</a>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Developers<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/developers/developer-resources.html">Developer resources</a></li>
+                  <li><a href="/developers/version-control.html">Version control</a></li>
+                  <li><a href="/developers/buildingmahout.html">Build from source</a></li>
+                  <li><a href="/developers/issue-tracker.html">Issue tracker</a></li>
+                  <li><a href="https://builds.apache.org/job/Mahout-Quality/" target="_blank">Code quality reports</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Contributions</li>
+                  <li><a href="/developers/how-to-contribute.html">How to contribute</a></li>
+                  <li><a href="/developers/how-to-become-a-committer.html">How to become a committer</a></li>
+                  <li><a href="/developers/gsoc.html">GSoC</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">For committers</li>
+                  <li><a href="/developers/how-to-update-the-website.html">How to update the website</a></li>
+                  <li><a href="/developers/patch-check-list.html">Patch check list</a></li>
+                  <li><a href="/developers/github.html">Handling Github PRs</a></li>
+                  <li><a href="/developers/how-to-release.html">How to release</a></li>
+                  <li><a href="/developers/thirdparty-dependencies.html">Third party dependencies</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Basics<b class="caret"></b></a>
+                 <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of algorithms</a>
+                  <li><a href="/users/basics/quickstart.html">Quickstart</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Working with text</li>
+                  <li><a href="/users/basics/creating-vectors-from-text.html">Creating vectors from text</a>
+                  <li><a href="/users/basics/collocations.html">Collocations</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Dimensionality reduction</li>
+                  <li><a href="/users/dim-reduction/dimensional-reduction.html">Singular Value Decomposition</a></li>
+                  <li><a href="/users/dim-reduction/ssvd.html">Stochastic SVD</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Topic Models</li>      
+                  <li><a href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet Allocation</a></li>
+                </ul>
+                 </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Spark<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/sparkbindings/home.html">Scala &amp; Spark Bindings Overview</a></li>
+                  <li><a href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark Shell</a></li>
+			      <li class="divider"></li>
+                  <li><a href="/users/sparkbindings/faq.html">FAQ</a></li>
+                </ul>
+               </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Classification<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li>
+                  <li><a href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov Models</a></li>
+                  <li><a href="/users/mapreduce/classification/logistic-regression.html">Logistic Regression</a></li>
+                  <li><a href="/users/mapreduce/classification/partial-implementation.html">Random Forest</a></li>
+
+                  <li class="divider"></li>
+                  <li class="nav-header">Examples</li>
+                  <li><a href="/users/mapreduce/classification/breiman-example.html">Breiman example</a></li>
+                  <li><a href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups example</a></li>
+                </ul></li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Clustering<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li>
+                <li><a href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li>
+                <li><a href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li>
+                <li><a href="/users/mapreduce/clustering/streaming-k-means.html">Streaming KMeans</a></li>
+                <li><a href="/users/mapreduce/clustering/spectral-clustering.html">Spectral Clustering</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Commandline usage</li>
+                <li><a href="/users/mapreduce/clustering/k-means-commandline.html">Options for k-Means</a></li>
+                <li><a href="/users/mapreduce/clustering/canopy-commandline.html">Options for Canopy</a></li>
+                <li><a href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy k-Means</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Examples</li>
+                <li><a href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic data</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Post processing</li>
+                <li><a href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper tool</a></li>
+                <li><a href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster visualisation</a></li>
+                </ul></li>
+                <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Recommendations<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li>
+                <li><a href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First Timer FAQ</a></li>
+                <li><a href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based recommender <br/>in 5 minutes</a></li>
+		<li><a href="/users/mapreduce/recommender/matrix-factorization.html">Matrix factorization-based<br/> recommenders</a></li>
+                <li><a href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Hadoop</li>
+                <li><a href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to item-based recommendations<br/> with Hadoop</a></li>
+                <li><a href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS recommendations<br/> with Hadoop</a></li>
+                <li class="nav-header">Spark</li>
+                <li><a href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to cooccurrence-based<br/> recommendations with Spark</a></li>
+              </ul>
+            </li>
+           </ul>
+          </div><!--/.nav-collapse -->
+        </div>
+      </div>
+    </div>
+
+</div>
+
+ <div id="sidebar">
+  <div id="sidebar-wrap">
+    <h2>Twitter</h2>
+	<ul class="sidemenu">
+		<li>
+<a class="twitter-timeline" href="https://twitter.com/ApacheMahout" data-widget-id="422861673444028416">Tweets by @ApacheMahout</a>
+<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
+</li>
+	</ul>
+    <h2>Apache Software Foundation</h2>
+    <ul class="sidemenu">
+      <li><a href="http://www.apache.org/foundation/how-it-works.html">How the ASF works</a></li>
+      <li><a href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a></li>
+      <li><a href="http://www.apache.org/dev/">Developer Resources</a></li>
+      <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
+      <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+    </ul>
+    <h2>Related Projects</h2>
+    <ul class="sidemenu">
+      <li><a href="http://lucene.apache.org/">Lucene</a></li>
+      <li><a href="http://hadoop.apache.org/">Hadoop</a></li>
+    </ul>
+  </div>
+</div>
+
+  <div id="content-wrap" class="clearfix">
+   <div id="main">
+    <h1 id="introduction-to-item-based-recommendations-with-hadoop">Introduction to Item-Based Recommendations with Hadoop</h1>
+<h2 id="overview">Overview</h2>
+<p>Mahout’s item based recommender is a flexible and easily implemented algorithm with a diverse range of applications. The minimalism of the primary input file’s structure and availability of ancillary filtering controls can make sourcing required data and shaping a desired output both efficient and straightforward.</p>
+<p>Typical use cases include:</p>
+<ul>
+<li>Recommend products to customers via an eCommerce platform (think: Amazon, Netflix, Overstock)</li>
+<li>Identify organic sales opportunities</li>
+<li>Segment users/customers based on similar item preferences</li>
+</ul>
+<p>Broadly speaking, Mahout's item-based recommendation algorithm takes as input customer preferences by item and generates an output recommending similar items with a score indicating whether a customer will "like" the recommended item.</p>
+<p>One of the strengths of the item based recommender is its adaptability to your business conditions or research interests. For example, there are many available approaches for providing product preference. One such method is to calculate the total orders for a given product for each customer (i.e. Acme Corp has ordered Widget-A 5,678 times) while others rely on user preference captured via the web (i.e. Jane Doe rated a movie as five stars, or gave a product two thumbs’ up).</p>
+<p>Additionally, a variety of methodologies can be implemented to narrow the focus of Mahout's recommendations, such as:</p>
+<ul>
+<li>Exclude low volume or low profitability products from consideration</li>
+<li>Group customers by segment or market rather than using user/customer level data</li>
+<li>Exclude zero-dollar transactions, returns or other order types</li>
+<li>Map product substitutions into the Mahout input (i.e. if WidgetA is a recommended item replace it with WidgetX)</li>
+</ul>
+<p>The item based recommender output can be easily consumed by downstream applications (i.e. websites, ERP systems or salesforce automation tools) and is configurable so users can determine the number of item recommendations generated by the algorithm.</p>
+<h2 id="example">Example</h2>
+<p>Testing the item based recommender can be a simple and potentially quite rewarding endeavor. Whereas the typical sample use case for collaborative filtering focuses on utilization of, and integration with, eCommerce platforms we can instead look at a potential use case applicable to most businesses (even those without a web presence). Let’s look at how a company might use Mahout’s item based recommender to identify new sales opportunities for an existing customer base. First, you’ll need to get Mahout up and running, the instructions for which can be found <a href="https://mahout.apache.org/users/basics/quickstart.html">here</a>. After you've ensured Mahout is properly installed, we’re ready to run a quick example.</p>
+<p><strong>Step 1: Gather some test data</strong></p>
+<p>Mahout’s item based recommender relies on three key pieces of data: <em>userID</em>, <em>itemID</em> and <em>preference</em>. The “users” could be website visitors or simply customers that purchase products from your business. Similarly, items could be products, product groups or even pages on your website – really anything you would want to recommend to a group of users or customers. For our example let’s use customer orders as a proxy for preference. A simple count of distinct orders by customer, by product will work for this example. You’ll find as you explore ways to manipulate the item based recommender the preference value can be many things (page clicks, explicit ratings, order counts, etc.). Once your test data is gathered put it in a <em>.txt</em> file separated by commas with no column headers included.</p>
+<p><strong>Step 2: Pick a similarity measure</strong></p>
+<p>Choosing a similarity measure for use in a production environment is something that requires careful testing, evaluation and research. For our example purposes, we’ll just go with a Mahout similarity classname called <em>SIMILARITY_LOGLIKELIHOOD</em>.</p>
+<p><strong>Step 3: Configure the Mahout command</strong></p>
+<p>Assuming your <em>JAVA_HOME</em> is appropriately set and Mahout was installed properly we’re ready to configure our syntax. Enter the following command:</p>
+<div class="codehilite"><pre>$ <span class="n">mahout</span> <span class="n">recommenditembased</span> <span class="o">-</span><span class="n">s</span> <span class="n">SIMILARITY_LOGLIKELIHOOD</span> <span class="o">-</span><span class="nb">i</span> <span class="o">/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">input</span><span class="o">/</span><span class="n">file</span> <span class="o">-</span><span class="n">o</span> <span class="o">/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">desired</span><span class="o">/</span><span class="n">output</span> <span class="o">--</span><span class="n">numRecommendations</span> 25
+</pre></div>
+
+
+<p>Running the command will execute a series of jobs the final product of which will be an output file deposited to the directory specified in the command syntax. The output file will contain two columns: the <em>userID</em> and an array of <em>itemIDs</em> and scores.</p>
+<p><strong>Step 4: Making use of the output and doing more with Mahout</strong></p>
+<p>The output file generated in our simple example can be transformed using your tool of choice and consumed by downstream applications. There exist a variety of configuration options for Mahout’s item based recommender to accommodate custom business requirements; exploring and testing various configurations to suit your needs will doubtless lead to additional questions. Our user community is accessible via our <a href="https://mahout.apache.org/general/mailing-lists,-irc-and-archives.html">mailing list</a> and the book <em>Mahout In Action</em> is a fantastic (but slightly outdated) starting point. </p>
+   </div>
+  </div>     
+</div> 
+  <footer class="footer" align="center">
+    <div class="container">
+      <p>
+        Copyright &copy; 2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache and the Apache feather logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </footer>
+  
+  <script src="/js/jquery-1.9.1.min.js"></script>
+  <script src="/js/bootstrap.min.js"></script>
+  <script>
+    (function() {
+      var cx = '012254517474945470291:vhsfv7eokdc';
+      var gcse = document.createElement('script');
+      gcse.type = 'text/javascript';
+      gcse.async = true;
+      gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
+          '//www.google.com/cse/cse.js?cx=' + cx;
+      var s = document.getElementsByTagName('script')[0];
+      s.parentNode.insertBefore(gcse, s);
+    })();
+  </script>
+</body>
+</html>

Added: websites/staging/mahout/trunk/content/users/mapreduce/recommender/matrix-factorization.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/mapreduce/recommender/matrix-factorization.html (added)
+++ websites/staging/mahout/trunk/content/users/mapreduce/recommender/matrix-factorization.html Thu Mar 19 21:21:45 2015
@@ -0,0 +1,441 @@
+<!DOCTYPE html>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <title>Apache Mahout: Scalable machine learning and data mining</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="Distribution" content="Global">
+  <meta name="Robots" content="index,follow">
+  <meta name="keywords" content="apache, apache hadoop, apache lucene,
+        business data mining, cluster analysis,
+        collaborative filtering, data extraction, data filtering, data framework, data integration,
+        data matching, data mining, data mining algorithms, data mining analysis, data mining data,
+        data mining introduction, data mining software,
+        data mining techniques, data representation, data set, datamining,
+        feature extraction, fuzzy k means, genetic algorithm, hadoop,
+        hierarchical clustering, high dimensional, introduction to data mining, kmeans,
+        knowledge discovery, learning approach, learning approaches, learning methods,
+        learning techniques, lucene, machine learning, machine translation, mahout apache,
+        mahout taste, map reduce hadoop, mining data, mining methods, naive bayes,
+        natural language processing,
+        supervised, text mining, time series data, unsupervised, web data mining">
+  <link rel="shortcut icon" type="image/x-icon" href="http://mahout.apache.org/images/favicon.ico">
+  <script type="text/javascript" src="/js/prototype.js"></script>
+  <script type="text/javascript" src="/js/effects.js"></script>
+  <script type="text/javascript" src="/js/search.js"></script>
+  <script type="text/javascript" src="/js/slides.js"></script>
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-responsive.css" rel="stylesheet">
+  <link rel="stylesheet" href="/css/global.css" type="text/css">
+
+  <!-- mathJax stuff -- use `\(...\)` for inline style math in markdown -->
+  <script type="text/x-mathjax-config">
+  MathJax.Hub.Config({
+    tex2jax: {
+      skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+    }
+  });
+  MathJax.Hub.Queue(function() {
+    var all = MathJax.Hub.getAllJax(), i;
+    for(i = 0; i < all.length; i += 1) {
+      all[i].SourceElement().parentNode.className += ' has-jax';
+    }
+  });
+  </script>
+  <script type="text/javascript">
+    var mathjax = document.createElement('script'); 
+    mathjax.type = 'text/javascript'; 
+    mathjax.async = true;
+
+    mathjax.src = ('https:' == document.location.protocol) ?
+        'https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' : 
+        'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+	
+	  var s = document.getElementsByTagName('script')[0]; 
+    s.parentNode.insertBefore(mathjax, s);
+  </script>
+</head>
+
+<body id="home" data-twttr-rendered="true">
+  <div id="wrap">
+   <div id="header">
+    <div id="logo"><a href="/overview.html"></a></div>
+  <div id="search">
+    <form id="search-form" action="http://www.google.com/search" method="get" class="navbar-search pull-right">    
+      <input value="http://mahout.apache.org" name="sitesearch" type="hidden">
+      <input class="search-query" name="q" id="query" type="text">
+      <input id="submission" type="image" src="/images/mahout-lupe.png" alt="Search" />
+    </form>
+  </div>
+
+    <div class="navbar navbar-inverse" style="position:absolute;top:133px;padding-right:0px;padding-left:0px;">
+      <div class="navbar-inner" style="border: none; background: #999; border: none; border-radius: 0px;">
+        <div class="container">
+          <button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <!-- <a class="brand" href="#">Apache Community Development Project</a> -->
+          <div class="nav-collapse collapse">
+            <ul class="nav">
+              <li><a href="/">Home</a></li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">General<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/general/downloads.html">Downloads</a>
+                  <li><a href="/general/who-we-are.html">Who we are</a>
+                  <li><a href="/general/mailing-lists,-irc-and-archives.html">Mailing Lists</a>
+                  <li><a href="/general/release-notes.html">Release Notes</a> 
+                  <li><a href="/general/books-tutorials-and-talks.html">Books, Tutorials, Talks</a></li>
+                  <li><a href="/general/powered-by-mahout.html">Powered By Mahout</a>
+                  <li><a href="/general/professional-support.html">Professional Support</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Resources</li>
+                  <li><a href="/general/reference-reading.html">Reference Reading</a>
+                  <li><a href="/general/faq.html">FAQ</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Legal</li>
+                  <li><a href="http://www.apache.org/licenses/">License</a></li>
+                  <li><a href="http://www.apache.org/security/">Security</a></li>
+                  <li><a href="/general/privacy-policy.html">Privacy Policy</a>
+                </ul>
+              </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Developers<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/developers/developer-resources.html">Developer resources</a></li>
+                  <li><a href="/developers/version-control.html">Version control</a></li>
+                  <li><a href="/developers/buildingmahout.html">Build from source</a></li>
+                  <li><a href="/developers/issue-tracker.html">Issue tracker</a></li>
+                  <li><a href="https://builds.apache.org/job/Mahout-Quality/" target="_blank">Code quality reports</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Contributions</li>
+                  <li><a href="/developers/how-to-contribute.html">How to contribute</a></li>
+                  <li><a href="/developers/how-to-become-a-committer.html">How to become a committer</a></li>
+                  <li><a href="/developers/gsoc.html">GSoC</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">For committers</li>
+                  <li><a href="/developers/how-to-update-the-website.html">How to update the website</a></li>
+                  <li><a href="/developers/patch-check-list.html">Patch check list</a></li>
+                  <li><a href="/developers/github.html">Handling Github PRs</a></li>
+                  <li><a href="/developers/how-to-release.html">How to release</a></li>
+                  <li><a href="/developers/thirdparty-dependencies.html">Third party dependencies</a></li>
+                </ul>
+               </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Basics<b class="caret"></b></a>
+                 <ul class="dropdown-menu">
+                  <li><a href="/users/basics/algorithms.html">List of algorithms</a>
+                  <li><a href="/users/basics/quickstart.html">Quickstart</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Working with text</li>
+                  <li><a href="/users/basics/creating-vectors-from-text.html">Creating vectors from text</a>
+                  <li><a href="/users/basics/collocations.html">Collocations</a>
+                  <li class="divider"></li>
+                  <li class="nav-header">Dimensionality reduction</li>
+                  <li><a href="/users/dim-reduction/dimensional-reduction.html">Singular Value Decomposition</a></li>
+                  <li><a href="/users/dim-reduction/ssvd.html">Stochastic SVD</a></li>
+                  <li class="divider"></li>
+                  <li class="nav-header">Topic Models</li>      
+                  <li><a href="/users/clustering/latent-dirichlet-allocation.html">Latent Dirichlet Allocation</a></li>
+                </ul>
+                 </li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Spark<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/sparkbindings/home.html">Scala &amp; Spark Bindings Overview</a></li>
+                  <li><a href="/users/sparkbindings/play-with-shell.html">Playing with Mahout's Spark Shell</a></li>
+			      <li class="divider"></li>
+                  <li><a href="/users/sparkbindings/faq.html">FAQ</a></li>
+                </ul>
+               </li>
+              <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Classification<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/users/mapreduce/classification/bayesian.html">Naive Bayes</a></li>
+                  <li><a href="/users/mapreduce/classification/hidden-markov-models.html">Hidden Markov Models</a></li>
+                  <li><a href="/users/mapreduce/classification/logistic-regression.html">Logistic Regression</a></li>
+                  <li><a href="/users/mapreduce/classification/partial-implementation.html">Random Forest</a></li>
+
+                  <li class="divider"></li>
+                  <li class="nav-header">Examples</li>
+                  <li><a href="/users/mapreduce/classification/breiman-example.html">Breiman example</a></li>
+                  <li><a href="/users/mapreduce/classification/twenty-newsgroups.html">20 newsgroups example</a></li>
+                </ul></li>
+               <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Clustering<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a href="/users/mapreduce/clustering/k-means-clustering.html">k-Means</a></li>
+                <li><a href="/users/mapreduce/clustering/canopy-clustering.html">Canopy</a></li>
+                <li><a href="/users/mapreduce/clustering/fuzzy-k-means.html">Fuzzy k-Means</a></li>
+                <li><a href="/users/mapreduce/clustering/streaming-k-means.html">Streaming KMeans</a></li>
+                <li><a href="/users/mapreduce/clustering/spectral-clustering.html">Spectral Clustering</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Commandline usage</li>
+                <li><a href="/users/mapreduce/clustering/k-means-commandline.html">Options for k-Means</a></li>
+                <li><a href="/users/mapreduce/clustering/canopy-commandline.html">Options for Canopy</a></li>
+                <li><a href="/users/mapreduce/clustering/fuzzy-k-means-commandline.html">Options for Fuzzy k-Means</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Examples</li>
+                <li><a href="/users/mapreduce/clustering/clustering-of-synthetic-control-data.html">Synthetic data</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Post processing</li>
+                <li><a href="/users/mapreduce/clustering/cluster-dumper.html">Cluster Dumper tool</a></li>
+                <li><a href="/users/mapreduce/clustering/visualizing-sample-clusters.html">Cluster visualisation</a></li>
+                </ul></li>
+                <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Recommendations<b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                <li><a href="/users/mapreduce/recommender/quickstart.html">Quickstart</a></li>
+                <li><a href="/users/mapreduce/recommender/recommender-first-timer-faq.html">First Timer FAQ</a></li>
+                <li><a href="/users/mapreduce/recommender/userbased-5-minutes.html">A user-based recommender <br/>in 5 minutes</a></li>
+		<li><a href="/users/mapreduce/recommender/matrix-factorization.html">Matrix factorization-based<br/> recommenders</a></li>
+                <li><a href="/users/mapreduce/recommender/recommender-documentation.html">Overview</a></li>
+                <li class="divider"></li>
+                <li class="nav-header">Hadoop</li>
+                <li><a href="/users/mapreduce/recommender/intro-itembased-hadoop.html">Intro to item-based recommendations<br/> with Hadoop</a></li>
+                <li><a href="/users/mapreduce/recommender/intro-als-hadoop.html">Intro to ALS recommendations<br/> with Hadoop</a></li>
+                <li class="nav-header">Spark</li>
+                <li><a href="/users/mapreduce/recommender/intro-cooccurrence-spark.html">Intro to cooccurrence-based<br/> recommendations with Spark</a></li>
+              </ul>
+            </li>
+           </ul>
+          </div><!--/.nav-collapse -->
+        </div>
+      </div>
+    </div>
+
+</div>
+
+ <div id="sidebar">
+  <div id="sidebar-wrap">
+    <h2>Twitter</h2>
+	<ul class="sidemenu">
+		<li>
+<a class="twitter-timeline" href="https://twitter.com/ApacheMahout" data-widget-id="422861673444028416">Tweets by @ApacheMahout</a>
+<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
+</li>
+	</ul>
+    <h2>Apache Software Foundation</h2>
+    <ul class="sidemenu">
+      <li><a href="http://www.apache.org/foundation/how-it-works.html">How the ASF works</a></li>
+      <li><a href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a></li>
+      <li><a href="http://www.apache.org/dev/">Developer Resources</a></li>
+      <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
+      <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+    </ul>
+    <h2>Related Projects</h2>
+    <ul class="sidemenu">
+      <li><a href="http://lucene.apache.org/">Lucene</a></li>
+      <li><a href="http://hadoop.apache.org/">Hadoop</a></li>
+    </ul>
+  </div>
+</div>
+
+  <div id="content-wrap" class="clearfix">
+   <div id="main">
+    <p><a name="MatrixFactorization-Intro"></a></p>
+<h1 id="introduction-to-matrix-factorization-for-recommendation-mining">Introduction to Matrix Factorization for Recommendation Mining</h1>
+<p>In the mathematical discipline of linear algebra, a matrix decomposition 
+or matrix factorization is a dimensionality reduction technique that factorizes a matrix into a product of matrices, usually two. 
+There are many different matrix decompositions, each finds use among a particular class of problems.</p>
+<p>In mahout, the SVDRecommender provides an interface to build recommender based on matrix factorization.
+The idea behind is to project the users and items onto a feature space and try to optimize U and M so that U * (M^t) is as close to R as possible:</p>
+<div class="codehilite"><pre> <span class="n">U</span> <span class="n">is</span> <span class="n">n</span> <span class="o">*</span> <span class="n">p</span> <span class="n">user</span> <span class="n">feature</span> <span class="n">matrix</span><span class="p">,</span> 
+ <span class="n">M</span> <span class="n">is</span> <span class="n">m</span> <span class="o">*</span> <span class="n">p</span> <span class="n">item</span> <span class="n">feature</span> <span class="n">matrix</span><span class="p">,</span> <span class="n">M</span>^<span class="n">t</span> <span class="n">is</span> <span class="n">the</span> <span class="n">conjugate</span> <span class="n">transpose</span> <span class="n">of</span> <span class="n">M</span><span class="p">,</span>
+ <span class="n">R</span> <span class="n">is</span> <span class="n">n</span> <span class="o">*</span> <span class="n">m</span> <span class="n">rating</span> <span class="n">matrix</span><span class="p">,</span>
+ <span class="n">n</span> <span class="n">is</span> <span class="n">the</span> <span class="n">number</span> <span class="n">of</span> <span class="n">users</span><span class="p">,</span>
+ <span class="n">m</span> <span class="n">is</span> <span class="n">the</span> <span class="n">number</span> <span class="n">of</span> <span class="n">items</span><span class="p">,</span>
+ <span class="n">p</span> <span class="n">is</span> <span class="n">the</span> <span class="n">number</span> <span class="n">of</span> <span class="n">features</span>
+</pre></div>
+
+
+<p>We usually use RMSE to represent the deviations between predictions and atual ratings.
+RMSE is defined as the squared root of the sum of squared errors at each known user item ratings.
+So our matrix factorization target could be mathmatically defined as:</p>
+<div class="codehilite"><pre> <span class="nb">find</span> <span class="n">U</span> <span class="n">and</span> <span class="n">M</span><span class="p">,</span> <span class="p">(</span><span class="n">U</span><span class="p">,</span> <span class="n">M</span><span class="p">)</span> <span class="p">=</span> <span class="n">argmin</span><span class="p">(</span><span class="n">RMSE</span><span class="p">)</span> <span class="p">=</span> <span class="n">argmin</span><span class="p">(</span><span class="n">pow</span><span class="p">(</span><span class="n">SSE</span> <span class="o">/</span> <span class="n">K</span><span class="p">,</span> 0<span class="p">.</span>5<span class="p">))</span>
+
+ <span class="n">SSE</span> <span class="p">=</span> <span class="n">sum</span><span class="p">(</span><span class="n">e</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span>^2<span class="p">)</span>
+ <span class="n">e</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="p">=</span> <span class="n">r</span><span class="p">(</span><span class="n">u</span><span class="p">,</span> <span class="nb">i</span><span class="p">)</span> <span class="o">-</span> <span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span> <span class="o">*</span> <span class="p">(</span><span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span>^<span class="n">t</span><span class="p">)</span> <span class="p">=</span> <span class="n">r</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="o">-</span> <span class="n">sum</span><span class="p">(</span><span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="n">
 f</span><span class="p">]</span> <span class="o">*</span> <span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">f</span><span class="p">]),</span> <span class="n">f</span> <span class="p">=</span> 0<span class="p">,</span> 1<span class="p">,</span> <span class="p">..</span> <span class="n">p</span> <span class="o">-</span> 1
+ <span class="n">K</span> <span class="n">is</span> <span class="n">the</span> <span class="n">number</span> <span class="n">of</span> <span class="n">known</span> <span class="n">user</span> <span class="n">item</span> <span class="n">ratings</span><span class="p">.</span>
+</pre></div>
+
+
+<p><a name="MatrixFactorization-Factorizers"></a></p>
+<p>Mahout has implemented matrix factorization based on </p>
+<div class="codehilite"><pre><span class="p">(</span>1<span class="p">)</span> <span class="n">SGD</span><span class="p">(</span><span class="n">Stochastic</span> <span class="n">Gradient</span> <span class="n">Descent</span><span class="p">)</span>
+<span class="p">(</span>2<span class="p">)</span> <span class="n">ALSWR</span><span class="p">(</span><span class="n">Alternating</span><span class="o">-</span><span class="n">Least</span><span class="o">-</span><span class="n">Squares</span> <span class="n">with</span> <span class="n">Weighted</span><span class="o">-</span>λ<span class="o">-</span><span class="n">Regularization</span><span class="p">).</span>
+</pre></div>
+
+
+<h2 id="sgd">SGD</h2>
+<p>Stochastic gradient descent is a gradient descent optimization method for minimizing an objective function that is written as a su of differentiable functions.</p>
+<div class="codehilite"><pre>   <span class="n">Q</span><span class="p">(</span><span class="n">w</span><span class="p">)</span> <span class="p">=</span> <span class="n">sum</span><span class="p">(</span><span class="n">Q_i</span><span class="p">(</span><span class="n">w</span><span class="p">)),</span>
+</pre></div>
+
+
+<p>where w is the parameters to be estimated,
+      Q(w) is the objective function that could be expressed as sum of differentiable functions,
+      Q_i(w) is associated with the i-th observation in the data set </p>
+<p>In practice, w is estimated using an iterative method at each single sample until an approximate miminum is obtained,</p>
+<div class="codehilite"><pre>  <span class="n">w</span> <span class="p">=</span> <span class="n">w</span> <span class="o">-</span> <span class="n">alpha</span> <span class="o">*</span> <span class="p">(</span><span class="n">d</span><span class="p">(</span><span class="n">Q_i</span><span class="p">(</span><span class="n">w</span><span class="p">))</span><span class="o">/</span><span class="n">dw</span><span class="p">),</span>
+</pre></div>
+
+
+<p>where aplpha is the learning rate,
+      (d(Q_i(w))/dw) is the first derivative of Q_i(w) on w.</p>
+<p>In matrix factorization, the RatingSGDFactorizer class implements the SGD with w = (U, M) and objective function Q(w) = sum(Q(u,i)),</p>
+<div class="codehilite"><pre>   <span class="n">Q</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="p">=</span>  <span class="n">sum</span><span class="p">(</span><span class="n">e</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="o">*</span> <span class="n">e</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">))</span> <span class="o">/</span> 2 <span class="o">+</span> <span class="n">lambda</span> <span class="o">*</span> <span class="p">[(</span><span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span> <span class="o">*</span> <span class="p">(</span><span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span>^<span class="n">t</span><span class="p">))</span
 > <span class="o">+</span> <span class="p">(</span><span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span> <span class="o">*</span> <span class="p">(</span><span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span>^<span class="n">t</span><span class="p">))]</span> <span class="o">/</span> 2
+</pre></div>
+
+
+<p>where Q(u, i) is the objecive function for user u and item i,
+      e(u, i) is the error between predicted rating and actual rating,
+      U[u,] is the feature vector of user u,
+      M[i,] is the feature vector of item i,
+      lambda is the regularization parameter to prevent overfitting.</p>
+<p>The algorithm is sketched as follows:</p>
+<div class="codehilite"><pre>  <span class="n">init</span> <span class="n">U</span> <span class="n">and</span> <span class="n">M</span> <span class="n">with</span> <span class="n">randomized</span> <span class="n">value</span> <span class="n">between</span> 0<span class="p">.</span>0 <span class="n">and</span> 1<span class="p">.</span>0 <span class="n">with</span> <span class="n">standard</span> <span class="n">Gaussian</span> <span class="n">distribution</span>
+
+  <span class="k">for</span><span class="p">(</span><span class="n">iter</span> <span class="p">=</span> 0<span class="p">;</span> <span class="n">iter</span> <span class="o">&lt;</span> <span class="n">numIterations</span><span class="p">;</span> <span class="n">iter</span><span class="o">++</span><span class="p">)</span>
+  <span class="p">{</span>
+      <span class="k">for</span><span class="p">(</span><span class="n">user</span> <span class="n">u</span> <span class="n">and</span> <span class="n">item</span> <span class="nb">i</span> <span class="n">with</span> <span class="n">rating</span> <span class="n">R</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">])</span>
+      <span class="p">{</span>
+          <span class="n">predicted_rating</span> <span class="p">=</span> <span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span> <span class="o">*</span>  <span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span>^<span class="n">t</span> <span class="o">//</span><span class="nb">dot</span> <span class="n">product</span> <span class="n">of</span> <span class="n">feature</span> <span class="n">vectors</span> <span class="n">between</span> <span class="n">user</span> <span class="n">u</span> <span class="n">and</span> <span class="n">item</span> <span class="nb">i</span>
+          <span class="n">err</span> <span class="p">=</span> <span class="n">R</span><span class="p">[</span><span class="n">u</span><span class="p">,</span> <span class="nb">i</span><span class="p">]</span> <span class="o">-</span> <span class="n">predicted_rating</span>
+          <span class="o">//</span><span class="n">adjust</span> <span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span> <span class="n">and</span> <span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span>
+          <span class="o">//</span> <span class="n">p</span> <span class="n">is</span> <span class="n">the</span> <span class="n">number</span> <span class="n">of</span> <span class="n">features</span>
+          <span class="k">for</span><span class="p">(</span><span class="n">f</span> <span class="p">=</span> 0<span class="p">;</span> <span class="n">f</span> <span class="o">&lt;</span> <span class="n">p</span><span class="p">;</span> <span class="n">f</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
+             <span class="n">NU</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="n">f</span><span class="p">]</span> <span class="p">=</span> <span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="n">f</span><span class="p">]</span> <span class="o">-</span> <span class="n">alpha</span> <span class="o">*</span> <span class="n">d</span><span class="p">(</span><span class="n">Q</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">))</span><span class="o">/</span><span class="n">d</span><span class="p">(</span><span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="n">f</span><span class="p">])</span> <span class="o">//</span><span class="n">optimize</span> <span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="n">f</span>
 <span class="p">]</span>
+                     <span class="p">=</span> <span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,</span> <span class="n">f</span><span class="p">]</span> <span class="o">+</span> <span class="n">alpha</span> <span class="o">*</span> <span class="p">(</span><span class="n">e</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="o">*</span> <span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">f</span><span class="p">]</span> <span class="o">-</span> <span class="n">lambda</span> <span class="o">*</span> <span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="n">f</span><span class="p">])</span> 
+          <span class="p">}</span>
+          <span class="k">for</span><span class="p">(</span><span class="n">f</span> <span class="p">=</span> 0<span class="p">;</span> <span class="n">f</span> <span class="o">&lt;</span> <span class="n">p</span><span class="p">;</span> <span class="n">f</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
+             <span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">f</span><span class="p">]</span> <span class="p">=</span> <span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">f</span><span class="p">]</span> <span class="o">-</span> <span class="n">alpha</span> <span class="o">*</span> <span class="n">d</span><span class="p">(</span><span class="n">Q</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">))</span><span class="o">/</span><span class="n">d</span><span class="p">(</span><span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">f</span><span class="p">])</span>  <span class="o">//</span><span class="n">optimize</span> <span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">f</s
 pan><span class="p">]</span> 
+                    <span class="p">=</span> <span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">f</span><span class="p">]</span> <span class="o">+</span> <span class="n">alpha</span> <span class="o">*</span> <span class="p">(</span><span class="n">e</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="o">*</span> <span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="n">f</span><span class="p">]</span> <span class="o">-</span> <span class="n">lambda</span> <span class="o">*</span> <span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">f</span><span class="p">])</span> 
+          <span class="p">}</span>
+          <span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span> <span class="p">=</span> <span class="n">NU</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span>
+      <span class="p">}</span>
+  <span class="p">}</span>
+</pre></div>
+
+
+<h2 id="svd">SVD++</h2>
+<p>SVD++ is an enhancement of the SGD matrix factorization. </p>
+<p>It could be considered as an integration of latent factor model and neighborhood based model, considering not only how users rate, but also who has rated what. </p>
+<p>The complete model is a sum of 3 sub-models with complete prediction formula as follows: </p>
+<div class="codehilite"><pre><span class="n">pr</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="p">=</span> <span class="n">b</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">fm</span> <span class="o">+</span> <span class="n">nm</span>   <span class="o">//</span><span class="n">user</span> <span class="n">u</span> <span class="n">and</span> <span class="n">item</span> <span class="nb">i</span>
+
+<span class="n">pr</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="n">is</span> <span class="n">the</span> <span class="n">predicted</span> <span class="n">rating</span> <span class="n">of</span> <span class="n">user</span> <span class="n">u</span> <span class="n">on</span> <span class="n">item</span> <span class="nb">i</span><span class="p">,</span>
+<span class="n">b</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">]</span> <span class="p">=</span> <span class="n">U</span> <span class="o">+</span> <span class="n">b</span><span class="p">(</span><span class="n">u</span><span class="p">)</span> <span class="o">+</span> <span class="n">b</span><span class="p">(</span><span class="nb">i</span><span class="p">)</span>
+<span class="n">fm</span> <span class="p">=</span> <span class="p">(</span><span class="n">q</span><span class="p">[</span><span class="nb">i</span><span class="p">,])</span> <span class="o">*</span> <span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span> <span class="o">+</span> <span class="n">pow</span><span class="p">(</span><span class="o">|</span><span class="n">N</span><span class="p">(</span><span class="n">u</span><span class="p">)</span><span class="o">|</span><span class="p">,</span> <span class="o">-</span>0<span class="p">.</span>5<span class="p">)</span> <span class="o">*</span> <span class="n">sum</span><span class="p">(</span><span class="n">y</span><span class="p">[</span><span class="nb">j</span><span class="p">,])),</span>  <span class="nb">j</span> <span class="n">is</span> <span class="n">an</span> <span class="n">item</span> <span class="n">in</span> <span class="n">N</span><span class="p">(</spa
 n><span class="n">u</span><span class="p">)</span>
+<span class="n">nm</span> <span class="p">=</span> <span class="n">pow</span><span class="p">(</span><span class="o">|</span><span class="n">R</span><span class="p">(</span><span class="nb">i</span><span class="p">;</span><span class="n">u</span><span class="p">;</span><span class="n">k</span><span class="p">)</span><span class="o">|</span><span class="p">,</span> <span class="o">-</span>0<span class="p">.</span>5<span class="p">)</span> <span class="o">*</span> <span class="n">sum</span><span class="p">((</span><span class="n">r</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="n">j0</span><span class="p">]</span> <span class="o">-</span> <span class="n">b</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="n">j0</span><span class="p">])</span> <span class="o">*</span> <span class="n">w</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">j0</span><span class="p
 ">])</span> <span class="o">+</span> <span class="n">pow</span><span class="p">(</span><span class="o">|</span><span class="n">N</span><span class="p">(</span><span class="nb">i</span><span class="p">;</span><span class="n">u</span><span class="p">;</span><span class="n">k</span><span class="p">)</span><span class="o">|</span><span class="p">,</span> <span class="o">-</span>0<span class="p">.</span>5<span class="p">)</span> <span class="o">*</span> <span class="n">sum</span><span class="p">(</span><span class="n">c</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">j1</span><span class="p">]),</span> <span class="n">j0</span> <span class="n">is</span> <span class="n">an</span> <span class="n">item</span> <span class="n">in</span> <span class="n">R</span><span class="p">(</span><span class="nb">i</span><span class="p">;</span><span class="n">u</span><span class="p">;</span><span class="n">k</span><span class="p">),</span> <span class="n">j1
 </span> <span class="n">is</span> <span class="n">an</span> <span class="n">item</span> <span class="n">in</span> <span class="n">N</span><span class="p">(</span><span class="nb">i</span><span class="p">;</span><span class="n">u</span><span class="p">;</span><span class="n">k</span><span class="p">)</span>
+</pre></div>
+
+
+<p>The associated regularized squared error function to be minimized is:</p>
+<div class="codehilite"><pre><span class="p">{</span><span class="n">sum</span><span class="p">((</span><span class="n">r</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">]</span> <span class="o">-</span> <span class="n">pr</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">])</span> <span class="o">*</span> <span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">]</span> <span class="o">-</span> <span class="n">pr</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">]))</span>  <span class="o">-</span> <span class="n">lambda</span> <span class="o">*</span> <span class="p">(</span><span class="n">b</span><span class="p">(</span><span class="n">u</span><span class="p">)</span> <sp
 an class="o">*</span> <span class="n">b</span><span class="p">(</span><span class="n">u</span><span class="p">)</span> <span class="o">+</span> <span class="n">b</span><span class="p">(</span><span class="nb">i</span><span class="p">)</span> <span class="o">*</span> <span class="n">b</span><span class="p">(</span><span class="nb">i</span><span class="p">)</span> <span class="o">+</span> <span class="o">||</span><span class="n">q</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span><span class="o">||</span>^2 <span class="o">+</span> <span class="o">||</span><span class="n">p</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span><span class="o">||</span>^2 <span class="o">+</span> <span class="n">sum</span><span class="p">(</span><span class="o">||</span><span class="n">y</span><span class="p">[</span><span class="nb">j</span><span class="p">,]</span><span class="o">||</span>^2<span class="p">)</span> <span class="o">+</span> <span 
 class="n">sum</span><span class="p">(</span><span class="n">w</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">j0</span><span class="p">]</span> <span class="o">*</span> <span class="n">w</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">j0</span><span class="p">])</span> <span class="o">+</span> <span class="n">sum</span><span class="p">(</span><span class="n">c</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">j1</span><span class="p">]</span> <span class="o">*</span> <span class="n">c</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">j1</span><span class="p">]))}</span>
+</pre></div>
+
+
+<p>b[u,i] is the baseline estimate of user u's predicted rating on item i. U is users' overall average rating and b(u) and b(i) indicate the observed deviations of user u and item i's ratings from average. </p>
+<p>The baseline estimate is to adjust for the user and item effects - i.e, systematic tendencies for some users to give higher ratings than others and tendencies
+for some items to receive higher ratings than other items.</p>
+<p>fm is the latent factor model to capture the interactions between user and item via a feature layer. q[i,] is the feature vector of item i, and the rest part of the formula represents user u with a user feature vector and a sum of features of items in N(u),
+N(u) is the set of items that user u have expressed preference, y[j,] is feature vector of an item in N(u).</p>
+<p>nm is an extension of the classic item-based neighborhood model. 
+It captures not only the user's explicit ratings but also the user's implicit preferences. R(i;u;k) is the set of items that have got explicit rating from user u and only retain top k most similar items. r[u,j0] is the actual rating of user u on item j0, 
+b[u,j0] is the corresponding baseline estimate.</p>
+<p>The difference between r[u,j0] and b[u,j0] is weighted by a parameter w[i,j0], which could be thought as the similarity between item i and j0. </p>
+<p>N[i;u;k] is the top k most similar items that have got the user's preference.
+c[i;j1] is the paramter to be estimated. </p>
+<p>The value of w[i,j0] and c[i,j1] could be treated as the significance of the 
+user's explicit rating and implicit preference respectively.</p>
+<p>The parameters b, y, q, w, c are to be determined by minimizing the the associated regularized squared error function through gradient descent. We loop over all known ratings and for a given training case r[u,i], we apply gradient descent on the error function and modify the parameters by moving in the opposite direction of the gradient.</p>
+<p>For a complete analysis of the SVD++ algorithm,
+please refer to the paper <a href="http://research.yahoo.com/files/kdd08koren.pdf">Yehuda Koren: Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, KDD 2008</a>.</p>
+<p>In Mahout,SVDPlusPlusFactorizer class is a simplified implementation of the SVD++ algorithm.It mainly uses the latent factor model with item feature vector, user feature vector and user's preference, with pr(u,i) = fm = (q[i,]) * (p[u,] + pow(|N(u)|, -0.5) * sum(y[j,])) and the parameters to be determined are q, p, y. </p>
+<p>The update to q, p, y in each gradient descent step is:</p>
+<div class="codehilite"><pre>  <span class="n">err</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="p">=</span> <span class="n">r</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">]</span> <span class="o">-</span> <span class="n">pr</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">]</span>
+  <span class="n">q</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span> <span class="p">=</span> <span class="n">q</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span> <span class="o">+</span> <span class="n">alpha</span> <span class="o">*</span> <span class="p">(</span><span class="n">err</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span> <span class="o">+</span> <span class="n">pow</span><span class="p">(</span><span class="o">|</span><span class="n">N</span><span class="p">(</span><span class="n">u</span><span class="p">)</span><span class="o">|</span><span class="p">,</span> <span class="o">-</span>0<span class="p">.</span>5<span class="p">)</span> <span class="o">*</span> <span class="n">sum</span><spa
 n class="p">(</span><span class="n">y</span><span class="p">[</span><span class="nb">j</span><span class="p">,]))</span> <span class="o">-</span> <span class="n">lamda</span> <span class="o">*</span> <span class="n">q</span><span class="p">[</span><span class="nb">i</span><span class="p">,])</span> 
+  <span class="n">p</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span> <span class="p">=</span> <span class="n">p</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span> <span class="o">+</span> <span class="n">alpha</span> <span class="o">*</span> <span class="p">(</span><span class="n">err</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="o">*</span> <span class="n">q</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span> <span class="o">-</span> <span class="n">lambda</span> <span class="o">*</span> <span class="n">p</span><span class="p">[</span><span class="n">u</span><span class="p">,])</span>
+  <span class="k">for</span> <span class="nb">j</span> <span class="n">that</span> <span class="n">is</span> <span class="n">an</span> <span class="n">item</span> <span class="n">in</span> <span class="n">N</span><span class="p">(</span><span class="n">u</span><span class="p">):</span>
+     <span class="n">y</span><span class="p">[</span><span class="nb">j</span><span class="p">,]</span> <span class="p">=</span> <span class="n">y</span><span class="p">[</span><span class="nb">j</span><span class="p">,]</span> <span class="o">+</span> <span class="n">alpha</span> <span class="o">*</span> <span class="p">(</span><span class="n">err</span><span class="p">(</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">)</span> <span class="o">*</span> <span class="n">pow</span><span class="p">(</span><span class="o">|</span><span class="n">N</span><span class="p">(</span><span class="n">u</span><span class="p">)</span><span class="o">|</span><span class="p">,</span> <span class="o">-</span>0<span class="p">.</span>5<span class="p">)</span> <span class="o">*</span> <span class="n">q</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span> <span class="o">-</span> <span class="n">lambda</span> <span class="o">*</s
 pan> <span class="n">y</span><span class="p">[</span><span class="nb">j</span><span class="p">,])</span>
+</pre></div>
+
+
+<p>where alpha is the learning rate of gradient descent, N(u) is the items that user u has expressed preference.</p>
+<h2 id="parallel-sgd">Parallel SGD</h2>
+<p>Mahout has a parallel SGD implementation in ParallelSGDFactorizer class. It shuffles the user ratings in every iteration and 
+generates splits on the shuffled ratings. Each split is handled by a thread to update the user features and item features using 
+vanilla SGD. </p>
+<p>The implementation could be traced back to a lock-free version of SGD based on paper 
+<a href="http://www.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf">Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent</a>.</p>
+<h2 id="alswr">ALSWR</h2>
+<p>ALSWR is an iterative algorithm to solve the low rank factorization of user feature matrix U and item feature matrix M.<br />
+The loss function to be minimized is formulated as the sum of squared errors plus <a href="http://en.wikipedia.org/wiki/Tikhonov_regularization">Tikhonov regularization</a>:</p>
+<div class="codehilite"><pre> <span class="n">L</span><span class="p">(</span><span class="n">R</span><span class="p">,</span> <span class="n">U</span><span class="p">,</span> <span class="n">M</span><span class="p">)</span> <span class="p">=</span> <span class="n">sum</span><span class="p">(</span><span class="n">pow</span><span class="p">((</span><span class="n">R</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="nb">i</span><span class="p">]</span> <span class="o">-</span> <span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span><span class="o">*</span> <span class="p">(</span><span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span>^<span class="n">t</span><span class="p">)),</span> 2<span class="p">))</span> <span class="o">+</span> <span class="n">lambda</span> <span class="o">*</span> <span class="p">(</span><span class="n">sum</span><span class="p">(</spa
 n><span class="n">n</span><span class="p">(</span><span class="n">u</span><span class="p">)</span> <span class="o">*</span> <span class="o">||</span><span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,]</span><span class="o">||</span>^2<span class="p">)</span> <span class="o">+</span> <span class="n">sum</span><span class="p">(</span><span class="n">n</span><span class="p">(</span><span class="nb">i</span><span class="p">)</span> <span class="o">*</span> <span class="o">||</span><span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,]</span><span class="o">||</span>^2<span class="p">))</span>
+</pre></div>
+
+
+<p>At the beginning of the algorithm, M is initialized with the average item ratings as its first row and random numbers for the rest row.  </p>
+<p>In every iteration, we fix M and solve U by minimization of the cost function L(R, U, M), then we fix U and solve M by the minimization of 
+the cost function similarly. The iteration stops until a certain stopping criteria is met.</p>
+<p>To solve the matrix U when M is given, each user's feature vector is calculated by resolving a regularized linear least square error function 
+using the items the user has rated and their feature vectors:</p>
+<div class="codehilite"><pre>  1<span class="o">/</span>2 <span class="o">*</span> <span class="n">d</span><span class="p">(</span><span class="n">L</span><span class="p">(</span><span class="n">R</span><span class="p">,</span><span class="n">U</span><span class="p">,</span><span class="n">M</span><span class="p">))</span> <span class="o">/</span> <span class="n">d</span><span class="p">(</span><span class="n">U</span><span class="p">[</span><span class="n">u</span><span class="p">,</span><span class="n">f</span><span class="p">])</span> <span class="p">=</span> 0
+</pre></div>
+
+
+<p>Similary, when M is updated, we resolve a regularized linear least square error function using feature vectors of the users that have rated the 
+item and their feature vectors:</p>
+<div class="codehilite"><pre>  1<span class="o">/</span>2 <span class="o">*</span> <span class="n">d</span><span class="p">(</span><span class="n">L</span><span class="p">(</span><span class="n">R</span><span class="p">,</span><span class="n">U</span><span class="p">,</span><span class="n">M</span><span class="p">))</span> <span class="o">/</span> <span class="n">d</span><span class="p">(</span><span class="n">M</span><span class="p">[</span><span class="nb">i</span><span class="p">,</span><span class="n">f</span><span class="p">])</span> <span class="p">=</span> 0
+</pre></div>
+
+
+<p>The ALSWRFactorizer class is a non-distributed implementation of ALSWR using multi-threading to dispatch the computation among several threads.
+Mahout also offers a <a href="https://mahout.apache.org/users/recommender/intro-als-hadoop.html">parallel map-reduce implementation</a>.</p>
+<p><a name="MatrixFactorization-Reference"></a></p>
+<h1 id="reference">Reference:</h1>
+<p><a href="http://en.wikipedia.org/wiki/Stochastic_gradient_descent">Stochastic gradient descent</a></p>
+<p><a href="http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08%28submitted%29.pdf">ALSWR</a></p>
+   </div>
+  </div>     
+</div> 
+  <footer class="footer" align="center">
+    <div class="container">
+      <p>
+        Copyright &copy; 2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache and the Apache feather logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </footer>
+  
+  <script src="/js/jquery-1.9.1.min.js"></script>
+  <script src="/js/bootstrap.min.js"></script>
+  <script>
+    (function() {
+      var cx = '012254517474945470291:vhsfv7eokdc';
+      var gcse = document.createElement('script');
+      gcse.type = 'text/javascript';
+      gcse.async = true;
+      gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
+          '//www.google.com/cse/cse.js?cx=' + cx;
+      var s = document.getElementsByTagName('script')[0];
+      s.parentNode.insertBefore(gcse, s);
+    })();
+  </script>
+</body>
+</html>



Mime
View raw message