mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r985117 [5/6] - in /websites/staging/mahout/trunk/content: ./ developers/ general/ images/ users/algorithms/ users/basics/ users/classification/ users/clustering/ users/dim-reduction/ users/environment/ users/flinkbindings/ users/misc/ user...
Date Fri, 08 Apr 2016 18:41:09 GMT
Modified: websites/staging/mahout/trunk/content/users/clustering/streaming-k-means.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/clustering/streaming-k-means.html (original)
+++ websites/staging/mahout/trunk/content/users/clustering/streaming-k-means.html Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a href="/users/environment/in-core-reference.html">In-Core Algebraic DSL Reference</a></li>
                   <li><a href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="streamingkmeans-algorithm"><em>StreamingKMeans</em> algorithm</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="streamingkmeans-algorithm"><em>StreamingKMeans</em> algorithm<a class="headerlink" href="#streamingkmeans-algorithm" title="Permanent link">&para;</a></h1>
 <p>The <em>StreamingKMeans</em> algorithm is a variant of Algorithm 1 from <a href="http://nips.cc/Conferences/2011/Program/event.php?ID=2989" title="M. Shindler, A. Wong, A. Meyerson: Fast and Accurate k-means For Large Datasets">Shindler et al</a> and consists of two steps:</p>
 <ol>
 <li>Streaming step </li>
@@ -276,9 +288,9 @@ expected number of clusters is <em>k</em
 clusters that will be passed on to the BallKMeans step which will further reduce the 
 number of clusters down to <em>k</em>. BallKMeans is a randomized Lloyd-type algorithm that
 has been studied in detail, see <a href="http://www.math.uwaterloo.ca/~cswamy/papers/kmeansfnl.pdf" title="R. Ostrovsky, Y. Rabani, L. Schulman, Ch. Swamy: The Effectiveness of Lloyd-Type Methods for the k-means Problem">Ostrovsky et al</a>.</p>
-<h2 id="streaming-step">Streaming step</h2>
+<h2 id="streaming-step">Streaming step<a class="headerlink" href="#streaming-step" title="Permanent link">&para;</a></h2>
 <hr />
-<h3 id="overview">Overview</h3>
+<h3 id="overview">Overview<a class="headerlink" href="#overview" title="Permanent link">&para;</a></h3>
 <p>The streaming step is a derivative of the streaming 
 portion of Algorithm 1 in <a href="http://nips.cc/Conferences/2011/Program/event.php?ID=2989" title="M. Shindler, A. Wong, A. Meyerson: Fast and Accurate k-means For Large Datasets">Shindler et al</a>. The main difference between the two is that 
 Algorithm 1 of <a href="http://nips.cc/Conferences/2011/Program/event.php?ID=2989" title="M. Shindler, A. Wong, A. Meyerson: Fast and Accurate k-means For Large Datasets">Shindler et al</a> assumes 
@@ -290,7 +302,7 @@ In contrast, Mahout implementation does
 data stream. Instead, it dynamically re-evaluates the parameters that depend on the size 
 of the data stream at runtime as more and more data is processed. In particular, 
 the parameter <em>numClusters</em> (defined below) changes its value as the data is processed.   </p>
-<h3 id="parameters">Parameters</h3>
+<h3 id="parameters">Parameters<a class="headerlink" href="#parameters" title="Permanent link">&para;</a></h3>
 <ul>
 <li><strong>numClusters</strong> (int): Conceptually, <em>numClusters</em> represents the algorithm's guess at the optimal 
 number of clusters it is shooting for. In particular, <em>numClusters</em> will increase at run 
@@ -305,7 +317,7 @@ common ratio <em>beta</em> (see below).
 <li><strong>clusterLogFactor</strong> (double): a constant parameter such that <em>clusterLogFactor</em> <em>log(numProcessedPoints)</em> is the runtime estimate of the number of clusters to be produced by the streaming step. If the final number of clusters (that we expect <em>StreamingKMeans</em> to output) is <em>k</em>, <em>clusterLogFactor</em> can be set to <em>k</em>.  </li>
 <li><strong>clusterOvershoot</strong> (double): a constant multiplicative slack factor that slows down the collapsing of clusters. The default value is 2. </li>
 </ul>
-<h3 id="algorithm">Algorithm</h3>
+<h3 id="algorithm">Algorithm<a class="headerlink" href="#algorithm" title="Permanent link">&para;</a></h3>
 <p>The algorithm processes the data one-by-one and makes only one pass through the data.
 The first point from the data stream will form the centroid of the first cluster (this designation may change as more points are processed). Suppose there are <em>r</em> clusters at one point and a new point <em>p</em> is being processed. The new point can either be added to one of the existing <em>r</em> clusters or become a new cluster. To decide:</p>
 <ul>
@@ -317,16 +329,16 @@ The first point from the data stream wil
 <p>There will be either <em>r</em> or <em>r+1</em> clusters after processing a new point.</p>
 <p>As the number of clusters increases, it will go over the  <em>clusterOvershoot * numClusters</em> limit (<em>numClusters</em> represents a recommendation for the number of clusters that the streaming step should aim for and <em>clusterOvershoot</em> is the slack). To decrease the number of clusters the existing clusters
 are treated as data points and are re-clustered (collapsed). This tends to make the number of clusters go down. If the number of clusters is still too high, <em>distanceCutoff</em> is increased.</p>
-<h2 id="ballkmeans-step">BallKMeans step</h2>
+<h2 id="ballkmeans-step">BallKMeans step<a class="headerlink" href="#ballkmeans-step" title="Permanent link">&para;</a></h2>
 <hr />
-<h3 id="overview_1">Overview</h3>
+<h3 id="overview_1">Overview<a class="headerlink" href="#overview_1" title="Permanent link">&para;</a></h3>
 <p>The algorithm is a Lloyd-type algorithm that takes a set of weighted vectors and returns k centroids, see <a href="http://www.math.uwaterloo.ca/~cswamy/papers/kmeansfnl.pdf" title="R. Ostrovsky, Y. Rabani, L. Schulman, Ch. Swamy: The Effectiveness of Lloyd-Type Methods for the k-means Problem">Ostrovsky et al</a> for details. The algorithm has two stages: </p>
 <ol>
 <li>Seeding </li>
 <li>Ball k-means </li>
 </ol>
 <p>The seeding stage is an initial guess of where the centroids should be. The initial guess is improved using the ball k-means stage. </p>
-<h3 id="parameters_1">Parameters</h3>
+<h3 id="parameters_1">Parameters<a class="headerlink" href="#parameters_1" title="Permanent link">&para;</a></h3>
 <ul>
 <li>
 <p><strong>numClusters</strong> (int): the number k of centroids to return.  The algorithm will return exactly this number of centroids.</p>
@@ -350,7 +362,7 @@ are treated as data points and are re-cl
 <p><strong>numRuns</strong> (int): This is the number of runs to perform. The solution of lowest cost is returned.  The default is 1 run.</p>
 </li>
 </ul>
-<h3 id="algorithm_1">Algorithm</h3>
+<h3 id="algorithm_1">Algorithm<a class="headerlink" href="#algorithm_1" title="Permanent link">&para;</a></h3>
 <p>The algorithm can be instructed to take multiple independent runs (using the <em>numRuns</em> parameter) and the algorithm will select the best solution (i.e., the one with the lowest cost). In practice, one run is sufficient to find a good solution.  </p>
 <p>Each run operates as follows: a seeding procedure is used to select k centroids, and then ball k-means is run iteratively to refine the solution.</p>
 <p>The seeding procedure can be set to either 'uniformly at random' or 'k-means++' using <em>kMeansPlusPlusInit</em> boolean variable. Seeding with k-means++ involves more computation but offers better results in practice. </p>
@@ -360,7 +372,7 @@ are treated as data points and are re-cl
 <li>The centers of mass of the trimmed clusters (see <em>trimFraction</em> parameter above) become the new centroids </li>
 </ol>
 <p>The data may be partitioned into a test set and a training set (see <em>testProbability</em>). The seeding procedure and ball k-means run on the training set. The cost is computed on the test set.</p>
-<h2 id="usage-of-streamingkmeans">Usage of <em>StreamingKMeans</em></h2>
+<h2 id="usage-of-streamingkmeans">Usage of <em>StreamingKMeans</em><a class="headerlink" href="#usage-of-streamingkmeans" title="Permanent link">&para;</a></h2>
 <div class="codehilite"><pre> <span class="n">bin</span><span class="o">/</span><span class="n">mahout</span> <span class="n">streamingkmeans</span>  
    <span class="o">-</span><span class="nb">i</span> <span class="o">&lt;</span><span class="n">input</span><span class="o">&gt;</span>  
    <span class="o">-</span><span class="n">o</span> <span class="o">&lt;</span><span class="n">output</span><span class="o">&gt;</span> 
@@ -387,7 +399,7 @@ are treated as data points and are re-cl
 </pre></div>
 
 
-<h3 id="details-on-job-specific-options">Details on Job-Specific Options:</h3>
+<h3 id="details-on-job-specific-options">Details on Job-Specific Options:<a class="headerlink" href="#details-on-job-specific-options" title="Permanent link">&para;</a></h3>
 <ul>
 <li><code>--input (-i) &lt;input&gt;</code>: Path to job input directory.         </li>
 <li><code>--output (-o) &lt;output&gt;</code>: The directory pathname for output.            </li>
@@ -412,7 +424,7 @@ are treated as data points and are re-cl
 <li><code>--startPhase &lt;startPhase&gt;</code> First phase to run.  </li>
 <li><code>--endPhase &lt;endPhase&gt;</code> Last phase to run.   </li>
 </ul>
-<h2 id="references">References</h2>
+<h2 id="references">References<a class="headerlink" href="#references" title="Permanent link">&para;</a></h2>
 <ol>
 <li><a href="http://nips.cc/Conferences/2011/Program/event.php?ID=2989" title="M. Shindler, A. Wong, A. Meyerson: Fast and Accurate k-means For Large Datasets">M. Shindler, A. Wong, A. Meyerson: Fast and Accurate k-means For Large Datasets</a></li>
 <li><a href="http://www.math.uwaterloo.ca/~cswamy/papers/kmeansfnl.pdf" title="R. Ostrovsky, Y. Rabani, L. Schulman, Ch. Swamy: The Effectiveness of Lloyd-Type Methods for the k-means Problem">R. Ostrovsky, Y. Rabani, L. Schulman, Ch. Swamy: The Effectiveness of Lloyd-Type Methods for the k-means Problem</a></li>

Modified: websites/staging/mahout/trunk/content/users/clustering/viewing-result.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/clustering/viewing-result.html (original)
+++ websites/staging/mahout/trunk/content/users/clustering/viewing-result.html Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a href="/users/environment/in-core-reference.html">In-Core Algebraic DSL Reference</a></li>
                   <li><a href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL Reference</a></li>
@@ -263,14 +264,25 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <ul>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, dt:hover > .elementid-permalink { visibility: visible }</style>
+<ul>
 <li><a href="#ViewingResult-AlgorithmViewingpages">Algorithm Viewing pages</a></li>
 </ul>
 <p>There are various technologies available to view the output of Mahout
 algorithms.
 * Clusters</p>
 <p><a name="ViewingResult-AlgorithmViewingpages"></a></p>
-<h1 id="algorithm-viewing-pages">Algorithm Viewing pages</h1>
+<h1 id="algorithm-viewing-pages">Algorithm Viewing pages<a class="headerlink" href="#algorithm-viewing-pages" title="Permanent link">&para;</a></h1>
 <p>{pagetree:root=@self|excerpt=true|expandCollapseAll=true}</p>
    </div>
   </div>     

Modified: websites/staging/mahout/trunk/content/users/clustering/viewing-results.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/clustering/viewing-results.html (original)
+++ websites/staging/mahout/trunk/content/users/clustering/viewing-results.html Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a href="/users/environment/in-core-reference.html">In-Core Algebraic DSL Reference</a></li>
                   <li><a href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL Reference</a></li>
@@ -263,27 +264,38 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="ViewingResults-Intro"></a></p>
-<h1 id="intro">Intro</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="ViewingResults-Intro"></a></p>
+<h1 id="intro">Intro<a class="headerlink" href="#intro" title="Permanent link">&para;</a></h1>
 <p>Many of the Mahout libraries run as batch jobs, dumping results into Hadoop
 sequence files or other data structures.  This page is intended to
 demonstrate the various ways one might inspect the outcome of various jobs.
  The page is organized by algorithms.</p>
 <p><a name="ViewingResults-GeneralUtilities"></a></p>
-<h1 id="general-utilities">General Utilities</h1>
+<h1 id="general-utilities">General Utilities<a class="headerlink" href="#general-utilities" title="Permanent link">&para;</a></h1>
 <p><a name="ViewingResults-SequenceFileDumper"></a></p>
-<h2 id="sequence-file-dumper">Sequence File Dumper</h2>
+<h2 id="sequence-file-dumper">Sequence File Dumper<a class="headerlink" href="#sequence-file-dumper" title="Permanent link">&para;</a></h2>
 <p><a name="ViewingResults-Clustering"></a></p>
-<h1 id="clustering">Clustering</h1>
+<h1 id="clustering">Clustering<a class="headerlink" href="#clustering" title="Permanent link">&para;</a></h1>
 <p><a name="ViewingResults-ClusterDumper"></a></p>
-<h2 id="cluster-dumper">Cluster Dumper</h2>
+<h2 id="cluster-dumper">Cluster Dumper<a class="headerlink" href="#cluster-dumper" title="Permanent link">&para;</a></h2>
 <p>Run the following to print out all options:</p>
 <div class="codehilite"><pre><span class="n">java</span>  <span class="o">-</span><span class="n">cp</span> &quot;<span class="o">*</span>&quot; <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">mahout</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">clustering</span><span class="p">.</span><span class="n">ClusterDumper</span> <span class="o">--</span><span class="n">help</span>
 </pre></div>
 
 
 <p><a name="ViewingResults-Example"></a></p>
-<h3 id="example">Example</h3>
+<h3 id="example">Example<a class="headerlink" href="#example" title="Permanent link">&para;</a></h3>
 <div class="codehilite"><pre><span class="n">java</span>  <span class="o">-</span><span class="n">cp</span> &quot;<span class="o">*</span>&quot; <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">mahout</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">clustering</span><span class="p">.</span><span class="n">ClusterDumper</span> <span class="o">--</span><span class="n">seqFileDir</span>
 </pre></div>
 
@@ -292,9 +304,9 @@ demonstrate the various ways one might i
           --dictionary ./solr-clust-n2/dictionary.txt
           --substring 100 --pointsDir ./solr-clust-n2/out/points/</p>
 <p><a name="ViewingResults-ClusterLabels(MAHOUT-163)"></a></p>
-<h2 id="cluster-labels-mahout-163">Cluster Labels (MAHOUT-163)</h2>
+<h2 id="cluster-labels-mahout-163">Cluster Labels (MAHOUT-163)<a class="headerlink" href="#cluster-labels-mahout-163" title="Permanent link">&para;</a></h2>
 <p><a name="ViewingResults-Classification"></a></p>
-<h1 id="classification">Classification</h1>
+<h1 id="classification">Classification<a class="headerlink" href="#classification" title="Permanent link">&para;</a></h1>
    </div>
   </div>     
 </div> 

Modified: websites/staging/mahout/trunk/content/users/clustering/visualizing-sample-clusters.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/clustering/visualizing-sample-clusters.html (original)
+++ websites/staging/mahout/trunk/content/users/clustering/visualizing-sample-clusters.html Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a href="/users/environment/in-core-reference.html">In-Core Algebraic DSL Reference</a></li>
                   <li><a href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p><a name="VisualizingSampleClusters-Introduction"></a></p>
-<h1 id="introduction">Introduction</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, dt:hover > .elementid-permalink { visibility: visible }</style>
+<p><a name="VisualizingSampleClusters-Introduction"></a></p>
+<h1 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">&para;</a></h1>
 <p>Mahout provides examples to visualize sample clusters that gets created by
 our clustering algorithms. Note that the visualization is done by Swing programs. You have to be in a window system on the same
 machine you run these, or logged in via a remote desktop.</p>
@@ -272,7 +284,7 @@ machine you run these, or logged in via
 classes under <em>org.apache.mahout.clustering.display</em> package in
 mahout-examples module. The easiest way to achieve this is to <a href="users/basics/quickstart.html">setup Mahout</a> in your IDE.</p>
 <p><a name="VisualizingSampleClusters-Visualizingclusters"></a></p>
-<h1 id="visualizing-clusters">Visualizing clusters</h1>
+<h1 id="visualizing-clusters">Visualizing clusters<a class="headerlink" href="#visualizing-clusters" title="Permanent link">&para;</a></h1>
 <p>The following classes in <em>org.apache.mahout.clustering.display</em> can be run
 without parameters to generate a sample data set and run the reference
 clustering implementations over them:</p>

Modified: websites/staging/mahout/trunk/content/users/dim-reduction/dimensional-reduction.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/dim-reduction/dimensional-reduction.html (original)
+++ websites/staging/mahout/trunk/content/users/dim-reduction/dimensional-reduction.html Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a href="/users/environment/in-core-reference.html">In-Core Algebraic DSL Reference</a></li>
                   <li><a href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL Reference</a></li>
@@ -263,7 +264,18 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="support-for-dimensional-reduction">Support for dimensional reduction</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="support-for-dimensional-reduction">Support for dimensional reduction<a class="headerlink" href="#support-for-dimensional-reduction" title="Permanent link">&para;</a></h1>
 <p>Matrix algebra underpins the way many Big Data algorithms and data
 structures are composed: full-text search can be viewed as doing matrix
 multiplication of the term-document matrix by the query vector (giving a
@@ -307,16 +319,16 @@ course, sparse matrices which don't fit
 far as decomposition is concerned. Parallelizable and/or stream-oriented
 algorithms are needed.</p>
 <p><a name="DimensionalReduction-SingularValueDecomposition"></a></p>
-<h1 id="singular-value-decomposition">Singular Value Decomposition</h1>
+<h1 id="singular-value-decomposition">Singular Value Decomposition<a class="headerlink" href="#singular-value-decomposition" title="Permanent link">&para;</a></h1>
 <p>Currently implemented in Mahout (as of 0.3, the first release with MAHOUT-180 applied), are two scalable implementations of SVD, a stream-oriented implementation using the Asymmetric Generalized Hebbian Algorithm outlined in Genevieve Gorrell &amp; Brandyn Webb's paper (<a href="-http://www.dcs.shef.ac.uk/~genevieve/gorrell_webb.pdf.html">Gorrell and Webb 2005</a>
 ); and there is a [Lanczos | http://en.wikipedia.org/wiki/Lanczos_algorithm]
  implementation, both single-threaded, and in the
 o.a.m.math.decomposer.lanczos package (math module), as a hadoop map-reduce
 (series of) job(s) in o.a.m.math.hadoop.decomposer package (core module).
 Coming soon: stochastic decomposition.</p>
-<p>See also: <a href="Wikipedia%20-%20SVD">https://cwiki.apache.org/confluence/display/MAHOUT/SVD+-+Singular+Value+Decomposition</a></p>
+<p>See also: <a href="Wikipedia - SVD">https://cwiki.apache.org/confluence/display/MAHOUT/SVD+-+Singular+Value+Decomposition</a></p>
 <p><a name="DimensionalReduction-Lanczos"></a></p>
-<h2 id="lanczos">Lanczos</h2>
+<h2 id="lanczos">Lanczos<a class="headerlink" href="#lanczos" title="Permanent link">&para;</a></h2>
 <p>The Lanczos algorithm is designed for eigen-decomposition, but like any
 such algorithm, getting singular vectors out of it is immediate (singular
 vectors of matrix A are just the eigenvectors of A^t * A or A * A^t). 
@@ -344,7 +356,7 @@ via Lanczos, and then discard the bottom
 the largest singular values (which is the case for using Lanczos for
 dimensional reduction).</p>
 <p><a name="DimensionalReduction-ParallelizationStragegy"></a></p>
-<h3 id="parallelization-stragegy">Parallelization Stragegy</h3>
+<h3 id="parallelization-stragegy">Parallelization Stragegy<a class="headerlink" href="#parallelization-stragegy" title="Permanent link">&para;</a></h3>
 <p>Lanczos is "embarassingly parallelizable": matrix multiplication of a
 matrix by a vector may be carried out row-at-a-time without communication
 until at the end, the results of the intermediate matrix-by-vector outputs
@@ -359,7 +371,7 @@ delaying writing to disk until Mapper cl
 a Combiner be the same as the Reducer, the bottleneck in accumulation is
 nowhere near a single point.</p>
 <p><a name="DimensionalReduction-Mahoutusage"></a></p>
-<h3 id="mahout-usage">Mahout usage</h3>
+<h3 id="mahout-usage">Mahout usage<a class="headerlink" href="#mahout-usage" title="Permanent link">&para;</a></h3>
 <p>The Mahout DistributedLanzcosSolver is invoked by the
 <MAHOUT_HOME>/bin/mahout svd command. This command takes the following
 arguments (which can be reproduced by just entering the command with no
@@ -456,7 +468,7 @@ the long form svd invocation:</p>
 <p>TODO: also allow exclusion based on improper orthogonality (currently
 computed, but not checked against constraints).</p>
 <p><a name="DimensionalReduction-Example:SVDofASFMailArchivesonAmazonElasticMapReduce"></a></p>
-<h4 id="example-svd-of-asf-mail-archives-on-amazon-elastic-mapreduce">Example: SVD of ASF Mail Archives on Amazon Elastic MapReduce</h4>
+<h4 id="example-svd-of-asf-mail-archives-on-amazon-elastic-mapreduce">Example: SVD of ASF Mail Archives on Amazon Elastic MapReduce<a class="headerlink" href="#example-svd-of-asf-mail-archives-on-amazon-elastic-mapreduce" title="Permanent link">&para;</a></h4>
 <p>This section walks you through a complete example of running the Mahout SVD
 job on Amazon Elastic MapReduce cluster and then preparing the output to be
 used for clustering. This example was developed as part of the effort to
@@ -479,7 +491,7 @@ mailing list, see: <a href="http://searc
 <p>Note: Some of this work is due in part to credits donated by the Amazon
 Elastic MapReduce team.</p>
 <p><a name="DimensionalReduction-1.LaunchEMRCluster"></a></p>
-<h5 id="1-launch-emr-cluster">1. Launch EMR Cluster</h5>
+<h5 id="1-launch-emr-cluster">1. Launch EMR Cluster<a class="headerlink" href="#1-launch-emr-cluster" title="Permanent link">&para;</a></h5>
 <p>For a detailed explanation of the steps involved in launching an Amazon
 Elastic MapReduce cluster for running Mahout jobs, please read the
 "Building Vectors for Large Document Sets" section of <a href="https://cwiki.apache.org/confluence/display/MAHOUT/Mahout+on+Elastic+MapReduce">Mahout on Elastic MapReduce</a>
@@ -487,11 +499,11 @@ Elastic MapReduce cluster for running Ma
 <p>In the remaining steps below, remember to replace JOB_ID with the Job ID of
 your EMR cluster.</p>
 <p><a name="DimensionalReduction-2.LoadMahout0.5+JARintoS3"></a></p>
-<h5 id="2-load-mahout-05-jar-into-s3">2. Load Mahout 0.5+ JAR into S3</h5>
+<h5 id="2-load-mahout-05-jar-into-s3">2. Load Mahout 0.5+ JAR into S3<a class="headerlink" href="#2-load-mahout-05-jar-into-s3" title="Permanent link">&para;</a></h5>
 <p>These steps were created with the mahout-0.5-SNAPSHOT because they rely on
 the patch for <a href="https://issues.apache.org/jira/browse/MAHOUT-639">MAHOUT-639</a></p>
 <p><a name="DimensionalReduction-3.CopyTFIDFVectorsintoHDFS"></a></p>
-<h5 id="3-copy-tfidf-vectors-into-hdfs">3. Copy TFIDF Vectors into HDFS</h5>
+<h5 id="3-copy-tfidf-vectors-into-hdfs">3. Copy TFIDF Vectors into HDFS<a class="headerlink" href="#3-copy-tfidf-vectors-into-hdfs" title="Permanent link">&para;</a></h5>
 <p>Before running your SVD job on the vectors, you need to copy them from S3
 to your EMR cluster's HDFS.</p>
 <div class="codehilite"><pre><span class="n">elastic</span><span class="o">-</span><span class="n">mapreduce</span> <span class="o">--</span><span class="n">jar</span> <span class="n">s3</span><span class="p">:</span><span class="o">//</span><span class="n">elasticmapreduce</span><span class="o">/</span><span class="n">samples</span><span class="o">/</span><span class="n">distcp</span><span class="o">/</span><span class="n">distcp</span><span class="p">.</span><span class="n">jar</span> <span class="o">\</span>
@@ -502,7 +514,7 @@ to your EMR cluster's HDFS.</p>
 
 
 <p><a name="DimensionalReduction-4.RuntheSVDJob"></a></p>
-<h5 id="4-run-the-svd-job">4. Run the SVD Job</h5>
+<h5 id="4-run-the-svd-job">4. Run the SVD Job<a class="headerlink" href="#4-run-the-svd-job" title="Permanent link">&para;</a></h5>
 <p>Now you're ready to run the SVD job on the vectors stored in HDFS:</p>
 <div class="codehilite"><pre><span class="n">elastic</span><span class="o">-</span><span class="n">mapreduce</span> <span class="o">--</span><span class="n">jar</span> <span class="n">s3</span><span class="p">:</span><span class="o">//</span><span class="n">BUCKET</span><span class="o">/</span><span class="n">mahout</span><span class="o">-</span><span class="n">examples</span><span class="o">-</span>0<span class="p">.</span>5<span class="o">-</span><span class="n">SNAPSHOT</span><span class="o">-</span><span class="n">job</span><span class="p">.</span><span class="n">jar</span> <span class="o">\</span>
   <span class="o">--</span><span class="n">main</span><span class="o">-</span><span class="n">class</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">mahout</span><span class="p">.</span><span class="n">driver</span><span class="p">.</span><span class="n">MahoutDriver</span> <span class="o">\</span>
@@ -528,7 +540,7 @@ removes any duplicate eigenvectors cause
 overflow and any that don't appear to be "eigen" enough (ie, they don't
 satisfy the eigenvector criterion with high enough fidelity). - Jake Mannix</p>
 <p><a name="DimensionalReduction-5.TransformyourTFIDFVectorsintoMahoutMatrix"></a></p>
-<h5 id="5-transform-your-tfidf-vectors-into-mahout-matrix">5. Transform your TFIDF Vectors into Mahout Matrix</h5>
+<h5 id="5-transform-your-tfidf-vectors-into-mahout-matrix">5. Transform your TFIDF Vectors into Mahout Matrix<a class="headerlink" href="#5-transform-your-tfidf-vectors-into-mahout-matrix" title="Permanent link">&para;</a></h5>
 <p>The tfidf vectors created by the seq2sparse job are
 SequenceFile<Text,VectorWritable>. The Mahout RowId job transforms these
 vectors into a matrix form that is a
@@ -558,7 +570,7 @@ your EMR cluster. The job produces the f
 <p>where docIndex is the SequenceFile<IntWritable,Text> and matrix is
 SequenceFile<IntWritable,VectorWritable>.</p>
 <p><a name="DimensionalReduction-6.TransposetheMatrix"></a></p>
-<h5 id="6-transpose-the-matrix">6. Transpose the Matrix</h5>
+<h5 id="6-transpose-the-matrix">6. Transpose the Matrix<a class="headerlink" href="#6-transpose-the-matrix" title="Permanent link">&para;</a></h5>
 <p>Our ultimate goal is to multiply the TFIDF vector matrix times our SVD
 eigenvectors. For the mathematically inclined, from the rowid job, we now
 have an m x n matrix T (m=6076937, n=20444). The SVD eigenvector matrix E
@@ -598,7 +610,7 @@ numColsZ == numColsX). - Jake Mannix</p>
 
 
 <p><a name="DimensionalReduction-7.TransposeEigenvectors"></a></p>
-<h5 id="7-transpose-eigenvectors">7. Transpose Eigenvectors</h5>
+<h5 id="7-transpose-eigenvectors">7. Transpose Eigenvectors<a class="headerlink" href="#7-transpose-eigenvectors" title="Permanent link">&para;</a></h5>
 <p>If you followed Jake's explanation in step 6 above, then you know that we
 also need to transpose the eigenvectors:</p>
 <div class="codehilite"><pre><span class="n">elastic</span><span class="o">-</span><span class="n">mapreduce</span> <span class="o">--</span><span class="n">jar</span> <span class="n">s3</span><span class="p">:</span><span class="o">//</span><span class="n">BUCKET</span><span class="o">/</span><span class="n">mahout</span><span class="o">-</span><span class="n">examples</span><span class="o">-</span>0<span class="p">.</span>5<span class="o">-</span><span class="n">SNAPSHOT</span><span class="o">-</span><span class="n">job</span><span class="p">.</span><span class="n">jar</span> <span class="o">\</span>
@@ -620,7 +632,7 @@ transposing the matrix you are multiplyi
 
 
 <p><a name="DimensionalReduction-8.MatrixMultiplication"></a></p>
-<h5 id="8-matrix-multiplication">8. Matrix Multiplication</h5>
+<h5 id="8-matrix-multiplication">8. Matrix Multiplication<a class="headerlink" href="#8-matrix-multiplication" title="Permanent link">&para;</a></h5>
 <p>Lastly, we need to multiply the transposed vectors using Mahout's
 matrixmult job:</p>
 <div class="codehilite"><pre><span class="n">elastic</span><span class="o">-</span><span class="n">mapreduce</span> <span class="o">--</span><span class="n">jar</span> <span class="n">s3</span><span class="p">:</span><span class="o">//</span><span class="n">BUCKET</span><span class="o">/</span><span class="n">mahout</span><span class="o">-</span><span class="n">examples</span><span class="o">-</span>0<span class="p">.</span>5<span class="o">-</span><span class="n">SNAPSHOT</span><span class="o">-</span><span class="n">job</span><span class="p">.</span><span class="n">jar</span> <span class="o">\</span>
@@ -643,7 +655,7 @@ matrixmult job:</p>
 
 
 <p><a name="DimensionalReduction-Resources"></a></p>
-<h1 id="resources">Resources</h1>
+<h1 id="resources">Resources<a class="headerlink" href="#resources" title="Permanent link">&para;</a></h1>
 <ul>
 <li><a href="http://www.dcs.shef.ac.uk/~genevieve/lsa_tutorial.htm">LSA tutorial</a></li>
 <li><a href="http://www.puffinwarellc.com/index.php/news-and-articles/articles/30-singular-value-decomposition-tutorial.html">SVD tutorial</a></li>

Modified: websites/staging/mahout/trunk/content/users/dim-reduction/ssvd.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/dim-reduction/ssvd.html (original)
+++ websites/staging/mahout/trunk/content/users/dim-reduction/ssvd.html Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a href="/users/environment/in-core-reference.html">In-Core Algebraic DSL Reference</a></li>
                   <li><a href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL Reference</a></li>
@@ -263,10 +264,21 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="stochastic-singular-value-decomposition">Stochastic Singular Value Decomposition</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="stochastic-singular-value-decomposition">Stochastic Singular Value Decomposition<a class="headerlink" href="#stochastic-singular-value-decomposition" title="Permanent link">&para;</a></h1>
 <p>Stochastic SVD method in Mahout produces reduced rank Singular Value Decomposition output in its 
 strict mathematical definition: <code>\(\mathbf{A\approx U}\boldsymbol{\Sigma}\mathbf{V}^{\top}\)</code>.</p>
-<h2 id="the-benefits-over-other-methods-are">The benefits over other methods are:</h2>
+<h2 id="the-benefits-over-other-methods-are">The benefits over other methods are:<a class="headerlink" href="#the-benefits-over-other-methods-are" title="Permanent link">&para;</a></h2>
 <ul>
 <li>
 <p>reduced flops required compared to Krylov subspace methods</p>
@@ -284,14 +296,14 @@ strict mathematical definition: <code>\(
 <p>As of 0.7 trunk, includes PCA and dimensionality reduction workflow (EXPERIMENTAL! Feedback on performance/other PCA related issues/ blogs is greatly appreciated.)</p>
 </li>
 </ul>
-<h3 id="map-reduce-characteristics">Map-Reduce characteristics:</h3>
+<h3 id="map-reduce-characteristics">Map-Reduce characteristics:<a class="headerlink" href="#map-reduce-characteristics" title="Permanent link">&para;</a></h3>
 <p>SSVD uses at most 3 MR sequential steps (map-only + map-reduce + 2 optional parallel map-reduce jobs) to produce reduced rank approximation of U, V and S matrices. Additionally, two more map-reduce steps are added for each power iteration step if requested.</p>
-<h2 id="potential-drawbacks">Potential drawbacks:</h2>
+<h2 id="potential-drawbacks">Potential drawbacks:<a class="headerlink" href="#potential-drawbacks" title="Permanent link">&para;</a></h2>
 <p>potentially less precise (but adding even one power iteration seems to fix that quite a bit).</p>
-<h2 id="documentation">Documentation</h2>
+<h2 id="documentation">Documentation<a class="headerlink" href="#documentation" title="Permanent link">&para;</a></h2>
 <p><a href="ssvd.page/SSVD-CLI.pdf">Overview and Usage</a></p>
 <p>Note: Please use 0.6 or later! for PCA workflow, please use 0.7 or later.</p>
-<h2 id="publications">Publications</h2>
+<h2 id="publications">Publications<a class="headerlink" href="#publications" title="Permanent link">&para;</a></h2>
 <p><a href="http://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf">Nathan Halko's dissertation</a> "Randomized methods for computing low-rank
 approximations of matrices" contains comprehensive definition of parallelization strategy taken in Mahout SSVD implementation and also some precision/scalability benchmarks, esp. w.r.t. Mahout Lanczos implementation on a typical corpus data set.</p>
 <p><a href="http://arxiv.org/abs/0909.4061">Halko, Martinsson, Tropp</a> paper discusses family of random projection-based algorithms and contains theoretical error estimates.</p>
@@ -318,7 +330,7 @@ x<span class="o">&lt;-</span> usim <span
 
 <p>and try to compare ssvd.svd(x) and stock svd(x) performance for the same rank k, notice the difference in the running time. Also play with power iterations (qIter) and compare accuracies of standard svd and SSVD.</p>
 <p>Note: numerical stability of R algorithms may differ from that of Mahout's distributed version. We haven't studied accuracy of the R simulation. For study of accuracy of Mahout's version, please refer to Nathan's dissertation as referenced above.</p>
-<h4 id="modified-ssvd-algorithm">Modified SSVD Algorithm.</h4>
+<h4 id="modified-ssvd-algorithm">Modified SSVD Algorithm.<a class="headerlink" href="#modified-ssvd-algorithm" title="Permanent link">&para;</a></h4>
 <p>Given an <code>\(m\times n\)</code>
 matrix <code>\(\mathbf{A}\)</code>, a target rank <code>\(k\in\mathbb{N}_{1}\)</code>
 , an oversampling parameter <code>\(p\in\mathbb{N}_{1}\)</code>, 

Modified: websites/staging/mahout/trunk/content/users/environment/classify-a-doc-from-the-shell.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/environment/classify-a-doc-from-the-shell.html (original)
+++ websites/staging/mahout/trunk/content/users/environment/classify-a-doc-from-the-shell.html Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a href="/users/environment/in-core-reference.html">In-Core Algebraic DSL Reference</a></li>
                   <li><a href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL Reference</a></li>
@@ -263,11 +264,22 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="building-a-text-classifier-in-mahouts-spark-shell">Building a text classifier in Mahout's Spark Shell</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="building-a-text-classifier-in-mahouts-spark-shell">Building a text classifier in Mahout's Spark Shell<a class="headerlink" href="#building-a-text-classifier-in-mahouts-spark-shell" title="Permanent link">&para;</a></h1>
 <p>This tutorial will take you through the steps used to train a Multinomial Naive Bayes model and create a text classifier based on that model using the <code>mahout spark-shell</code>. </p>
-<h2 id="prerequisites">Prerequisites</h2>
+<h2 id="prerequisites">Prerequisites<a class="headerlink" href="#prerequisites" title="Permanent link">&para;</a></h2>
 <p>This tutorial assumes that you have your Spark environment variables set for the <code>mahout spark-shell</code> see: <a href="http://mahout.apache.org/users/sparkbindings/play-with-shell.html">Playing with Mahout's Shell</a>.  As well we assume that Mahout is running in cluster mode (i.e. with the <code>MAHOUT_LOCAL</code> environment variable <strong>unset</strong>) as we'll be reading and writing to HDFS.</p>
-<h2 id="downloading-and-vectorizing-the-wikipedia-dataset">Downloading and Vectorizing the Wikipedia dataset</h2>
+<h2 id="downloading-and-vectorizing-the-wikipedia-dataset">Downloading and Vectorizing the Wikipedia dataset<a class="headerlink" href="#downloading-and-vectorizing-the-wikipedia-dataset" title="Permanent link">&para;</a></h2>
 <p><em>As of Mahout v. 0.10.0, we are still reliant on the MapReduce versions of <code>mahout seqwiki</code> and <code>mahout seq2sparse</code> to extract and vectorize our text.  A</em> <a href="https://issues.apache.org/jira/browse/MAHOUT-1663"><em>Spark implementation of seq2sparse</em></a> <em>is in the works for Mahout v. 0.11.</em> However, to download the Wikipedia dataset, extract the bodies of the documentation, label each document and vectorize the text into TF-IDF vectors, we can simpmly run the <a href="https://github.com/apache/mahout/blob/master/examples/bin/classify-wikipedia.sh">wikipedia-classifier.sh</a> example.  </p>
 <div class="codehilite"><pre><span class="n">Please</span> <span class="n">select</span> <span class="n">a</span> <span class="n">number</span> <span class="n">to</span> <span class="n">choose</span> <span class="n">the</span> <span class="n">corresponding</span> <span class="n">task</span> <span class="n">to</span> <span class="n">run</span>
 1<span class="p">.</span> <span class="n">CBayes</span> <span class="p">(</span><span class="n">may</span> <span class="n">require</span> <span class="n">increased</span> <span class="n">heap</span> <span class="n">space</span> <span class="n">on</span> <span class="n">yarn</span><span class="p">)</span>
@@ -278,14 +290,14 @@
 
 
 <p>Enter (2). This will download a large recent XML dump of the Wikipedia database, into a <code>/tmp/mahout-work-wiki</code> directory, unzip it and  place it into HDFS.  It will run a <a href="http://mahout.apache.org/users/classification/wikipedia-classifier-example.html">MapReduce job to parse the wikipedia set</a>, extracting and labeling only pages with category tags for [United States] and [United Kingdom] (~11600 documents). It will then run <code>mahout seq2sparse</code> to convert the documents into TF-IDF vectors.  The script will also a build and test a <a href="http://mahout.apache.org/users/classification/bayesian.html">Naive Bayes model using MapReduce</a>.  When it is completed, you should see a confusion matrix on your screen.  For this tutorial, we will ignore the MapReduce model, and build a new model using Spark based on the vectorized text output by <code>seq2sparse</code>.</p>
-<h2 id="getting-started">Getting Started</h2>
+<h2 id="getting-started">Getting Started<a class="headerlink" href="#getting-started" title="Permanent link">&para;</a></h2>
 <p>Launch the <code>mahout spark-shell</code>.  There is an example script: <code>spark-document-classifier.mscala</code> (.mscala denotes a Mahout-Scala script which can be run similarly to an R script).   We will be walking through this script for this tutorial but if you wanted to simply run the script, you could just issue the command: </p>
 <div class="codehilite"><pre><span class="n">mahout</span><span class="o">&gt;</span> <span class="p">:</span><span class="n">load</span> <span class="o">/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">mahout</span><span class="o">/</span><span class="n">examples</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">spark</span><span class="o">-</span><span class="n">document</span><span class="o">-</span><span class="n">classifier</span><span class="p">.</span><span class="n">mscala</span>
 </pre></div>
 
 
 <p>For now, lets take the script apart piece by piece.  You can cut and paste the following code blocks into the <code>mahout spark-shell</code>.</p>
-<h2 id="imports">Imports</h2>
+<h2 id="imports">Imports<a class="headerlink" href="#imports" title="Permanent link">&para;</a></h2>
 <p>Our Mahout Naive Bayes imports:</p>
 <div class="codehilite"><pre><span class="n">import</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">mahout</span><span class="p">.</span><span class="n">classifier</span><span class="p">.</span><span class="n">naivebayes</span><span class="p">.</span><span class="n">_</span>
 <span class="n">import</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">mahout</span><span class="p">.</span><span class="n">classifier</span><span class="p">.</span><span class="n">stats</span><span class="p">.</span><span class="n">_</span>
@@ -300,19 +312,19 @@
 </pre></div>
 
 
-<h2 id="read-in-our-full-set-from-hdfs-as-vectorized-by-seq2sparse-in-classify-wikipediash">Read in our full set from HDFS as vectorized by seq2sparse in classify-wikipedia.sh</h2>
+<h2 id="read-in-our-full-set-from-hdfs-as-vectorized-by-seq2sparse-in-classify-wikipediash">Read in our full set from HDFS as vectorized by seq2sparse in classify-wikipedia.sh<a class="headerlink" href="#read-in-our-full-set-from-hdfs-as-vectorized-by-seq2sparse-in-classify-wikipediash" title="Permanent link">&para;</a></h2>
 <div class="codehilite"><pre><span class="n">val</span> <span class="n">pathToData</span> <span class="p">=</span> &quot;<span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">mahout</span><span class="o">-</span><span class="n">work</span><span class="o">-</span><span class="n">wiki</span><span class="o">/</span>&quot;
 <span class="n">val</span> <span class="n">fullData</span> <span class="p">=</span> <span class="n">drmDfsRead</span><span class="p">(</span><span class="n">pathToData</span> <span class="o">+</span> &quot;<span class="n">wikipediaVecs</span><span class="o">/</span><span class="n">tfidf</span><span class="o">-</span><span class="n">vectors</span>&quot;<span class="p">)</span>
 </pre></div>
 
 
-<h2 id="extract-the-category-of-each-observation-and-aggregate-those-observations-by-category">Extract the category of each observation and aggregate those observations by category</h2>
+<h2 id="extract-the-category-of-each-observation-and-aggregate-those-observations-by-category">Extract the category of each observation and aggregate those observations by category<a class="headerlink" href="#extract-the-category-of-each-observation-and-aggregate-those-observations-by-category" title="Permanent link">&para;</a></h2>
 <div class="codehilite"><pre><span class="n">val</span> <span class="p">(</span><span class="n">labelIndex</span><span class="p">,</span> <span class="n">aggregatedObservations</span><span class="p">)</span> <span class="p">=</span> <span class="n">SparkNaiveBayes</span><span class="p">.</span><span class="n">extractLabelsAndAggregateObservations</span><span class="p">(</span>
                                                              <span class="n">fullData</span><span class="p">)</span>
 </pre></div>
 
 
-<h2 id="build-a-muitinomial-naive-bayes-model-and-self-test-on-the-training-set">Build a Muitinomial Naive Bayes model and self test on the training set</h2>
+<h2 id="build-a-muitinomial-naive-bayes-model-and-self-test-on-the-training-set">Build a Muitinomial Naive Bayes model and self test on the training set<a class="headerlink" href="#build-a-muitinomial-naive-bayes-model-and-self-test-on-the-training-set" title="Permanent link">&para;</a></h2>
 <div class="codehilite"><pre><span class="n">val</span> <span class="n">model</span> <span class="p">=</span> <span class="n">SparkNaiveBayes</span><span class="p">.</span><span class="n">train</span><span class="p">(</span><span class="n">aggregatedObservations</span><span class="p">,</span> <span class="n">labelIndex</span><span class="p">,</span> <span class="n">false</span><span class="p">)</span>
 <span class="n">val</span> <span class="n">resAnalyzer</span> <span class="p">=</span> <span class="n">SparkNaiveBayes</span><span class="p">.</span><span class="n">test</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">fullData</span><span class="p">,</span> <span class="n">false</span><span class="p">)</span>
 <span class="n">println</span><span class="p">(</span><span class="n">resAnalyzer</span><span class="p">)</span>
@@ -320,7 +332,7 @@
 
 
 <p>printing the <code>ResultAnalyzer</code> will display the confusion matrix.</p>
-<h2 id="read-in-the-dictionary-and-document-frequency-count-from-hdfs">Read in the dictionary and document frequency count from HDFS</h2>
+<h2 id="read-in-the-dictionary-and-document-frequency-count-from-hdfs">Read in the dictionary and document frequency count from HDFS<a class="headerlink" href="#read-in-the-dictionary-and-document-frequency-count-from-hdfs" title="Permanent link">&para;</a></h2>
 <div class="codehilite"><pre><span class="n">val</span> <span class="n">dictionary</span> <span class="p">=</span> <span class="n">sdc</span><span class="p">.</span><span class="n">sequenceFile</span><span class="p">(</span><span class="n">pathToData</span> <span class="o">+</span> &quot;<span class="n">wikipediaVecs</span><span class="o">/</span><span class="n">dictionary</span><span class="p">.</span><span class="n">file</span><span class="o">-</span>0&quot;<span class="p">,</span>
                                   <span class="n">classOf</span><span class="p">[</span><span class="n">Text</span><span class="p">],</span>
                                   <span class="n">classOf</span><span class="p">[</span><span class="n">IntWritable</span><span class="p">])</span>
@@ -344,7 +356,7 @@
 </pre></div>
 
 
-<h2 id="define-a-function-to-tokenize-and-vectorize-new-text-using-our-current-dictionary">Define a function to tokenize and vectorize new text using our current dictionary</h2>
+<h2 id="define-a-function-to-tokenize-and-vectorize-new-text-using-our-current-dictionary">Define a function to tokenize and vectorize new text using our current dictionary<a class="headerlink" href="#define-a-function-to-tokenize-and-vectorize-new-text-using-our-current-dictionary" title="Permanent link">&para;</a></h2>
 <p>For this simple example, our function <code>vectorizeDocument(...)</code> will tokenize a new document into unigrams using native Java String methods and vectorize using our dictionary and document frequencies. You could also use a <a href="https://lucene.apache.org/core/">Lucene</a> analyzer for bigrams, trigrams, etc., and integrate Apache <a href="https://tika.apache.org/">Tika</a> to extract text from different document types (PDF, PPT, XLS, etc.).  Here, however we will keep it simple, stripping and tokenizing our text using regexs and native String methods.</p>
 <div class="codehilite"><pre>def vectorizeDocument<span class="p">(</span>document: String<span class="p">,</span>
                         dictionaryMap: Map<span class="p">[</span>String<span class="p">,</span>Int<span class="p">],</span>
@@ -376,7 +388,7 @@
 </pre></div>
 
 
-<h2 id="setup-our-classifier">Setup our classifier</h2>
+<h2 id="setup-our-classifier">Setup our classifier<a class="headerlink" href="#setup-our-classifier" title="Permanent link">&para;</a></h2>
 <div class="codehilite"><pre><span class="n">val</span> <span class="n">labelMap</span> <span class="p">=</span> <span class="n">model</span><span class="p">.</span><span class="n">labelIndex</span>
 <span class="n">val</span> <span class="n">numLabels</span> <span class="p">=</span> <span class="n">model</span><span class="p">.</span><span class="n">numLabels</span>
 <span class="n">val</span> <span class="n">reverseLabelMap</span> <span class="p">=</span> <span class="n">labelMap</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="n">x</span> <span class="p">=</span><span class="o">&gt;</span> <span class="n">x</span><span class="p">.</span><span class="n">_2</span> <span class="o">-&gt;</span> <span class="n">x</span><span class="p">.</span><span class="n">_1</span><span class="p">)</span>
@@ -389,7 +401,7 @@
 </pre></div>
 
 
-<h2 id="define-an-argmax-function">Define an argmax function</h2>
+<h2 id="define-an-argmax-function">Define an argmax function<a class="headerlink" href="#define-an-argmax-function" title="Permanent link">&para;</a></h2>
 <p>The label with the highest score wins the classification for a given document.</p>
 <div class="codehilite"><pre>def argmax<span class="p">(</span>v: Vector<span class="p">)</span>: <span class="p">(</span>Int<span class="p">,</span> Double<span class="p">)</span> <span class="o">=</span> <span class="p">{</span>
     var bestIdx: Int <span class="o">=</span> Integer.MIN_VALUE
@@ -405,7 +417,7 @@
 </pre></div>
 
 
-<h2 id="define-our-tf-idf-vector-classifier">Define our TF(-IDF) vector classifier</h2>
+<h2 id="define-our-tf-idf-vector-classifier">Define our TF(-IDF) vector classifier<a class="headerlink" href="#define-our-tf-idf-vector-classifier" title="Permanent link">&para;</a></h2>
 <div class="codehilite"><pre><span class="n">def</span> <span class="n">classifyDocument</span><span class="p">(</span><span class="n">clvec</span><span class="p">:</span> <span class="n">Vector</span><span class="p">)</span> <span class="p">:</span> <span class="n">String</span> <span class="p">=</span> <span class="p">{</span>
     <span class="n">val</span> <span class="n">cvec</span> <span class="p">=</span> <span class="n">classifier</span><span class="p">.</span><span class="n">classifyFull</span><span class="p">(</span><span class="n">clvec</span><span class="p">)</span>
     <span class="n">val</span> <span class="p">(</span><span class="n">bestIdx</span><span class="p">,</span> <span class="n">bestScore</span><span class="p">)</span> <span class="p">=</span> <span class="n">argmax</span><span class="p">(</span><span class="n">cvec</span><span class="p">)</span>
@@ -414,7 +426,7 @@
 </pre></div>
 
 
-<h2 id="two-sample-news-articles-united-states-football-and-united-kingdom-football">Two sample news articles: United States Football and United Kingdom Football</h2>
+<h2 id="two-sample-news-articles-united-states-football-and-united-kingdom-football">Two sample news articles: United States Football and United Kingdom Football<a class="headerlink" href="#two-sample-news-articles-united-states-football-and-united-kingdom-football" title="Permanent link">&para;</a></h2>
 <div class="codehilite"><pre><span class="c1">// A random United States football article</span>
 <span class="c1">// http://www.reuters.com/article/2015/01/28/us-nfl-superbowl-security-idUSKBN0L12JR20150128</span>
 <span class="n">val</span> <span class="n">UStextToClassify</span> <span class="o">=</span> <span class="k">new</span> <span class="n">String</span><span class="p">(</span><span class="s">&quot;(Reuters) - Super Bowl security officials acknowledge&quot;</span> <span class="o">+</span>
@@ -483,7 +495,7 @@
 </pre></div>
 
 
-<h2 id="vectorize-and-classify-our-documents">Vectorize and classify our documents</h2>
+<h2 id="vectorize-and-classify-our-documents">Vectorize and classify our documents<a class="headerlink" href="#vectorize-and-classify-our-documents" title="Permanent link">&para;</a></h2>
 <div class="codehilite"><pre><span class="n">val</span> <span class="n">usVec</span> <span class="p">=</span> <span class="n">vectorizeDocument</span><span class="p">(</span><span class="n">UStextToClassify</span><span class="p">,</span> <span class="n">dictionaryMap</span><span class="p">,</span> <span class="n">dfCountMap</span><span class="p">)</span>
 <span class="n">val</span> <span class="n">ukVec</span> <span class="p">=</span> <span class="n">vectorizeDocument</span><span class="p">(</span><span class="n">UKtextToClassify</span><span class="p">,</span> <span class="n">dictionaryMap</span><span class="p">,</span> <span class="n">dfCountMap</span><span class="p">)</span>
 
@@ -495,7 +507,7 @@
 </pre></div>
 
 
-<h2 id="tie-everything-together-in-a-new-method-to-classify-text">Tie everything together in a new method to classify text</h2>
+<h2 id="tie-everything-together-in-a-new-method-to-classify-text">Tie everything together in a new method to classify text<a class="headerlink" href="#tie-everything-together-in-a-new-method-to-classify-text" title="Permanent link">&para;</a></h2>
 <div class="codehilite"><pre><span class="n">def</span> <span class="n">classifyText</span><span class="p">(</span><span class="n">txt</span><span class="p">:</span> <span class="n">String</span><span class="p">):</span> <span class="n">String</span> <span class="p">=</span> <span class="p">{</span>
     <span class="n">val</span> <span class="n">v</span> <span class="p">=</span> <span class="n">vectorizeDocument</span><span class="p">(</span><span class="n">txt</span><span class="p">,</span> <span class="n">dictionaryMap</span><span class="p">,</span> <span class="n">dfCountMap</span><span class="p">)</span>
     <span class="n">classifyDocument</span><span class="p">(</span><span class="n">v</span><span class="p">)</span>
@@ -503,13 +515,13 @@
 </pre></div>
 
 
-<h2 id="now-we-can-simply-call-our-classifytext-method-on-any-string">Now we can simply call our classifyText(...) method on any String</h2>
+<h2 id="now-we-can-simply-call-our-classifytext-method-on-any-string">Now we can simply call our classifyText(...) method on any String<a class="headerlink" href="#now-we-can-simply-call-our-classifytext-method-on-any-string" title="Permanent link">&para;</a></h2>
 <div class="codehilite"><pre><span class="n">classifyText</span><span class="p">(</span>&quot;<span class="n">Hello</span> <span class="n">world</span> <span class="n">from</span> <span class="n">Queens</span>&quot;<span class="p">)</span>
 <span class="n">classifyText</span><span class="p">(</span>&quot;<span class="n">Hello</span> <span class="n">world</span> <span class="n">from</span> <span class="n">London</span>&quot;<span class="p">)</span>
 </pre></div>
 
 
-<h2 id="model-persistance">Model persistance</h2>
+<h2 id="model-persistance">Model persistance<a class="headerlink" href="#model-persistance" title="Permanent link">&para;</a></h2>
 <p>You can save the model to HDFS:</p>
 <div class="codehilite"><pre><span class="n">model</span><span class="p">.</span><span class="n">dfsWrite</span><span class="p">(</span>&quot;<span class="o">/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">model</span>&quot;<span class="p">)</span>
 </pre></div>

Modified: websites/staging/mahout/trunk/content/users/environment/h2o-internals.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/environment/h2o-internals.html (original)
+++ websites/staging/mahout/trunk/content/users/environment/h2o-internals.html Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a href="/users/environment/in-core-reference.html">In-Core Algebraic DSL Reference</a></li>
                   <li><a href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL Reference</a></li>
@@ -263,18 +264,29 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="introduction">Introduction</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">&para;</a></h1>
 <p>This document provides an overview of how the Mahout Samsara environment is implemented over the H2O backend engine. The document is aimed at Mahout developers, to give a high level description of the design so that one can explore the code inside <code>h2o/</code> with some context.</p>
-<h2 id="h2o-overview">H2O Overview</h2>
+<h2 id="h2o-overview">H2O Overview<a class="headerlink" href="#h2o-overview" title="Permanent link">&para;</a></h2>
 <p>H2O is a distributed scalable machine learning system. Internal architecture of H2O has a distributed math engine (h2o-core) and a separate layer on top for algorithms and UI. The Mahout integration requires only the math engine (h2o-core).</p>
-<h2 id="h2o-data-model">H2O Data Model</h2>
+<h2 id="h2o-data-model">H2O Data Model<a class="headerlink" href="#h2o-data-model" title="Permanent link">&para;</a></h2>
 <p>The data model of the H2O math engine is a distributed columnar store (of primarily numbers, but also strings). A column of numbers is called a Vector, which is broken into Chunks (of a few thousand elements). Chunks are distributed across the cluster based on a deterministic hash. Therefore, any member of the cluster knows where a particular Chunk of a Vector is homed. Each Chunk is separately compressed in memory and elements are individually decompressed on the fly upon access with purely register operations (thereby achieving high memory throughput). An ordered set of similarly partitioned Vecs are composed into a Frame. A Frame is therefore a large two dimensional table of numbers. All elements of a logical row in the Frame are guaranteed to be homed in the same server of the cluster. Generally speaking, H2O works well on "tall skinny" data, i.e, lots of rows (100s of millions) and modest number of columns (10s of thousands).</p>
-<h2 id="mahout-drm">Mahout DRM</h2>
+<h2 id="mahout-drm">Mahout DRM<a class="headerlink" href="#mahout-drm" title="Permanent link">&para;</a></h2>
 <p>The Mahout DRM, or Distributed Row Matrix, is an abstraction for storing a large matrix of numbers in-memory in a cluster by distributing logical rows among servers. Mahout's scala DSL provides an abstract API on DRMs for backend engines to provide implementations of this API. Examples are the Spark and H2O backend engines. Each engine has it's own design of mapping the abstract API onto its data model and provides implementations for algebraic operators over that mapping.</p>
-<h2 id="h2o-environment-engine">H2O Environment Engine</h2>
+<h2 id="h2o-environment-engine">H2O Environment Engine<a class="headerlink" href="#h2o-environment-engine" title="Permanent link">&para;</a></h2>
 <p>The H2O backend implements the abstract DRM as an H2O Frame. Each logical column in the DRM is an H2O Vector. All elements of a logical DRM row are guaranteed to be homed on the same server. A set of rows stored on a server are presented as a read-only virtual in-core Matrix (i.e BlockMatrix) for the closure method in the <code>mapBlock(...)</code> API.</p>
 <p>H2O provides a flexible execution framework called <code>MRTask</code>. The <code>MRTask</code> framework typically executes over a Frame (or even a Vector), supports various types of map() methods, can optionally modify the Frame or Vector (though this never happens in the Mahout integration), and optionally create a new Vector or set of Vectors (to combine them into a new Frame, and consequently a new DRM).</p>
-<h2 id="source-layout">Source Layout</h2>
+<h2 id="source-layout">Source Layout<a class="headerlink" href="#source-layout" title="Permanent link">&para;</a></h2>
 <p>Within mahout.git, the top level directory, <code>h2o/</code> holds all the source code related to the H2O backend engine. Part of the code (that interfaces with the rest of the Mahout componenets) is in Scala, and part of the code (that interfaces with h2o-core and implements algebraic operators) is in Java. Here is a brief overview of what functionality can be found where within <code>h2o/</code>.</p>
 <p>h2o/ - top level directory containing all H2O related code</p>
 <p>h2o/src/main/java/org/apache/mahout/h2obindings/ops/*.java - Physical operator code for the various DSL algebra</p>

Modified: websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html (original)
+++ websites/staging/mahout/trunk/content/users/environment/how-to-build-an-app.html Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a href="/users/environment/in-core-reference.html">In-Core Algebraic DSL Reference</a></li>
                   <li><a href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL Reference</a></li>
@@ -263,10 +264,21 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h1 id="how-to-create-and-app-using-mahout">How to create and App using Mahout</h1>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, dt:hover > .elementid-permalink { visibility: visible }</style>
+<h1 id="how-to-create-and-app-using-mahout">How to create and App using Mahout<a class="headerlink" href="#how-to-create-and-app-using-mahout" title="Permanent link">&para;</a></h1>
 <p>This is an example of how to create a simple app using Mahout as a Library. The source is available on Github in the <a href="https://github.com/pferrel/3-input-cooc">3-input-cooc project</a> with more explanation about what it does (has to do with collaborative filtering). For this tutorial we'll concentrate on the app rather than the data science.</p>
 <p>The app reads in three user-item interactions types and creats indicators for them using cooccurrence and cross-cooccurrence. The indicators will be written to text files in a format ready for search engine indexing in search engine based recommender.</p>
-<h2 id="setup">Setup</h2>
+<h2 id="setup">Setup<a class="headerlink" href="#setup" title="Permanent link">&para;</a></h2>
 <p>In order to build and run the CooccurrenceDriver you need to install the following:</p>
 <ul>
 <li>Install the Java 7 JDK from Oracle. Mac users look here: <a href="http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html">Java SE Development Kit 7u72</a>.</li>
@@ -276,7 +288,7 @@
 </ul>
 <p>Why install if you are only using them as a library? Certain binaries and scripts are required by the libraries to get information about the environment like discovering where jars are located.</p>
 <p>Spark requires a set of jars on the classpath for the client side part of an app and another set of jars must be passed to the Spark Context for running distributed code. The example should discover all the neccessary classes automatically.</p>
-<h2 id="application">Application</h2>
+<h2 id="application">Application<a class="headerlink" href="#application" title="Permanent link">&para;</a></h2>
 <p>Using Mahout as a library in an application will require a little Scala code. Scala has an App trait so we'll create an object, which inherits from <code>App</code></p>
 <div class="codehilite"><pre><span class="n">object</span> <span class="n">CooccurrenceDriver</span> <span class="n">extends</span> <span class="n">App</span> <span class="p">{</span>
 <span class="p">}</span>
@@ -407,7 +419,7 @@ def writeIndicators<span class="p">(</sp
 </pre></div>
 
 
-<h2 id="build">Build</h2>
+<h2 id="build">Build<a class="headerlink" href="#build" title="Permanent link">&para;</a></h2>
 <p>Building the examples from project's root folder:</p>
 <div class="codehilite"><pre>$ <span class="n">sbt</span> <span class="n">pack</span>
 </pre></div>
@@ -419,7 +431,7 @@ def writeIndicators<span class="p">(</sp
 
 
 <p>The driver will execute in Spark standalone mode and put the data in /path/to/3-input-cooc/data/indicators/<em>indicator-type</em></p>
-<h2 id="using-a-debugger">Using a Debugger</h2>
+<h2 id="using-a-debugger">Using a Debugger<a class="headerlink" href="#using-a-debugger" title="Permanent link">&para;</a></h2>
 <p>To build and run this example in a debugger like IntelliJ IDEA. Install from the IntelliJ site and add the Scala plugin.</p>
 <p>Open IDEA and go to the menu File-&gt;New-&gt;Project from existing sources-&gt;SBT-&gt;/path/to/3-input-cooc. This will create an IDEA project from <code>build.sbt</code> in the root directory.</p>
 <p>At this point you may create a "Debug Configuration" to run. In the menu choose Run-&gt;Edit Configurations. Under "Default" choose "Application". In the dialog hit the elipsis button "..." to the right of "Environment Variables" and fill in your versions of JAVA_HOME, SPARK_HOME, and MAHOUT_HOME. In configuration editor under "Use classpath from" choose root-3-input-cooc module. </p>
@@ -427,7 +439,7 @@ def writeIndicators<span class="p">(</sp
 <p>Now choose "Application" in the left pane and hit the plus sign "+". give the config a name and hit the elipsis button to the right of the "Main class" field as shown.</p>
 <p><img alt="image" src="http://mahout.apache.org/images/debug-config-2.png" /></p>
 <p>After setting breakpoints you are now ready to debug the configuration. Go to the Run-&gt;Debug... menu and pick your configuration. This will execute using a local standalone instance of Spark.</p>
-<h2 id="the-mahout-shell">The Mahout Shell</h2>
+<h2 id="the-mahout-shell">The Mahout Shell<a class="headerlink" href="#the-mahout-shell" title="Permanent link">&para;</a></h2>
 <p>For small script-like apps you may wish to use the Mahout shell. It is a Scala REPL type interactive shell built on the Spark shell with Mahout-Samsara extensions.</p>
 <p>To make the CooccurrenceDriver.scala into a script make the following changes:</p>
 <ul>

Modified: websites/staging/mahout/trunk/content/users/environment/in-core-reference.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/environment/in-core-reference.html (original)
+++ websites/staging/mahout/trunk/content/users/environment/in-core-reference.html Fri Apr  8 18:41:08 2016
@@ -146,6 +146,7 @@
                   <li class="nav-header">Engines</li>
                   <li><a href="/users/sparkbindings/home.html">Spark</a></li>
                   <li><a href="/users/environment/h2o-internals.html">H2O</a></li>
+                  <li><a href="/users/flinkbindings/home.html">Flink</a></li>
                   <li class="nav-header">References</li>
                   <li><a href="/users/environment/in-core-reference.html">In-Core Algebraic DSL Reference</a></li>
                   <li><a href="/users/environment/out-of-core-reference.html">Distributed Algebraic DSL Reference</a></li>
@@ -263,8 +264,19 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <h2 id="mahout-samsaras-in-core-linear-algebra-dsl-reference">Mahout-Samsara's In-Core Linear Algebra DSL Reference</h2>
-<h4 id="imports">Imports</h4>
+    <style type="text/css">
+/* The following code is added by mdx_elementid.py
+   It was originally lifted from http://subversion.apache.org/style/site.css */
+/*
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+}
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, h6:hover > .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, dt:hover > .elementid-permalink { visibility: visible }</style>
+<h2 id="mahout-samsaras-in-core-linear-algebra-dsl-reference">Mahout-Samsara's In-Core Linear Algebra DSL Reference<a class="headerlink" href="#mahout-samsaras-in-core-linear-algebra-dsl-reference" title="Permanent link">&para;</a></h2>
+<h4 id="imports">Imports<a class="headerlink" href="#imports" title="Permanent link">&para;</a></h4>
 <p>The following imports are used to enable Mahout-Samsara's Scala DSL bindings for in-core Linear Algebra:</p>
 <div class="codehilite"><pre><span class="n">import</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">mahout</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">_</span>
 <span class="n">import</span> <span class="n">scalabindings</span><span class="p">.</span><span class="n">_</span>
@@ -272,7 +284,7 @@
 </pre></div>
 
 
-<h4 id="inline-initalization">Inline initalization</h4>
+<h4 id="inline-initalization">Inline initalization<a class="headerlink" href="#inline-initalization" title="Permanent link">&para;</a></h4>
 <p>Dense vectors:</p>
 <div class="codehilite"><pre>val densVec1: Vector = (1.0, 1.1, 1.2)
 val denseVec2 = dvec(1, 0, 1,1 ,1,2)
@@ -314,7 +326,7 @@ val sparseVec1 = svec((5 -&gt; 1.0) :: (
 </pre></div>
 
 
-<h4 id="slicing-and-assigning">Slicing and Assigning</h4>
+<h4 id="slicing-and-assigning">Slicing and Assigning<a class="headerlink" href="#slicing-and-assigning" title="Permanent link">&para;</a></h4>
 <p>Getting a vector element:</p>
 <div class="codehilite"><pre><span class="n">val</span> <span class="n">d</span> <span class="p">=</span> <span class="n">vec</span><span class="p">(</span>5<span class="p">)</span>
 </pre></div>
@@ -388,7 +400,7 @@ val sparseVec1 = svec((5 -&gt; 1.0) :: (
 </pre></div>
 
 
-<h4 id="blas-like-operations">BLAS-like operations</h4>
+<h4 id="blas-like-operations">BLAS-like operations<a class="headerlink" href="#blas-like-operations" title="Permanent link">&para;</a></h4>
 <p>Plus/minus either vector or numeric with assignment or not:</p>
 <div class="codehilite"><pre><span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
 <span class="n">a</span> <span class="o">-</span> <span class="n">b</span>
@@ -472,7 +484,7 @@ val sparseVec1 = svec((5 -&gt; 1.0) :: (
 
 
 <p>will not therefore incur any additional data copying.</p>
-<h4 id="decompositions">Decompositions</h4>
+<h4 id="decompositions">Decompositions<a class="headerlink" href="#decompositions" title="Permanent link">&para;</a></h4>
 <p>Matrix decompositions require an additional import:</p>
 <div class="codehilite"><pre><span class="n">import</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">mahout</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">decompositions</span><span class="p">.</span><span class="n">_</span>
 </pre></div>
@@ -525,7 +537,7 @@ val sparseVec1 = svec((5 -&gt; 1.0) :: (
 </pre></div>
 
 
-<h4 id="misc">Misc</h4>
+<h4 id="misc">Misc<a class="headerlink" href="#misc" title="Permanent link">&para;</a></h4>
 <p>Vector cardinality:</p>
 <div class="codehilite"><pre><span class="n">a</span><span class="p">.</span><span class="nb">length</span>
 </pre></div>
@@ -550,7 +562,7 @@ val sparseVec1 = svec((5 -&gt; 1.0) :: (
 </pre></div>
 
 
-<h4 id="random-matrices">Random Matrices</h4>
+<h4 id="random-matrices">Random Matrices<a class="headerlink" href="#random-matrices" title="Permanent link">&para;</a></h4>
 <p><code>\(\mathcal{U}\)</code>(0,1) random matrix view:</p>
 <div class="codehilite"><pre><span class="n">val</span> <span class="n">incCoreA</span> <span class="p">=</span> <span class="n">Matrices</span><span class="p">.</span><span class="n">uniformView</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">seed</span><span class="p">)</span>
 </pre></div>
@@ -566,7 +578,7 @@ val sparseVec1 = svec((5 -&gt; 1.0) :: (
 </pre></div>
 
 
-<h4 id="iterators">Iterators</h4>
+<h4 id="iterators">Iterators<a class="headerlink" href="#iterators" title="Permanent link">&para;</a></h4>
 <p>Mahout-Math already exposes a number of iterators.  Scala code just needs the following imports to enable implicit conversions to scala iterators.</p>
 <div class="codehilite"><pre><span class="n">import</span> <span class="n">collection</span><span class="p">.</span><span class="n">_</span>
 <span class="n">import</span> <span class="n">JavaConversions</span><span class="p">.</span><span class="n">_</span>



Mime
View raw message