mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rawkintr...@apache.org
Subject [11/29] mahout git commit: remove permalinks where not appropriate
Date Wed, 26 Apr 2017 03:04:53 GMT
http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/docs/0.13.1-SNAPSHOT/tutorials/play-with-shell.md
----------------------------------------------------------------------
diff --git a/website/docs/0.13.1-SNAPSHOT/tutorials/play-with-shell.md b/website/docs/0.13.1-SNAPSHOT/tutorials/play-with-shell.md
deleted file mode 100644
index d193160..0000000
--- a/website/docs/0.13.1-SNAPSHOT/tutorials/play-with-shell.md
+++ /dev/null
@@ -1,199 +0,0 @@
----
-layout: page
-title: Mahout Samsara In Core
-theme:
-    name: mahout2
----
-# Playing with Mahout's Spark Shell 
-
-This tutorial will show you how to play with Mahout's scala DSL for linear algebra and its Spark shell. **Please keep in mind that this code is still in a very early experimental stage**.
-
-_(Edited for 0.10.2)_
-
-## Intro
-
-We'll use an excerpt of a publicly available [dataset about cereals](http://lib.stat.cmu.edu/DASL/Datafiles/Cereals.html). The dataset tells the protein, fat, carbohydrate and sugars (in milligrams) contained in a set of cereals, as well as a customer rating for the cereals. Our aim for this example is to fit a linear model which infers the customer rating from the ingredients.
-
-
-Name                    | protein | fat | carbo | sugars | rating
-:-----------------------|:--------|:----|:------|:-------|:---------
-Apple Cinnamon Cheerios | 2       | 2   | 10.5  | 10     | 29.509541
-Cap'n'Crunch            | 1       | 2   | 12    | 12     | 18.042851  
-Cocoa Puffs             | 1       | 1   | 12    | 13     | 22.736446
-Froot Loops             | 2       |	1   | 11    | 13     | 32.207582  
-Honey Graham Ohs        | 1       |	2   | 12    | 11     | 21.871292
-Wheaties Honey Gold     | 2       | 1   | 16    |  8     | 36.187559  
-Cheerios                | 6       |	2   | 17    |  1     | 50.764999
-Clusters                | 3       |	2   | 13    |  7     | 40.400208
-Great Grains Pecan      | 3       | 3   | 13    |  4     | 45.811716  
-
-
-## Installing Mahout & Spark on your local machine
-
-We describe how to do a quick toy setup of Spark & Mahout on your local machine, so that you can run this example and play with the shell. 
-
- 1. Download [Apache Spark 1.6.2](http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz) and unpack the archive file
- 1. Change to the directory where you unpacked Spark and type ```sbt/sbt assembly``` to build it
- 1. Create a directory for Mahout somewhere on your machine, change to there and checkout the master branch of Apache Mahout from GitHub ```git clone https://github.com/apache/mahout mahout```
- 1. Change to the ```mahout``` directory and build mahout using ```mvn -DskipTests clean install```
- 
-## Starting Mahout's Spark shell
-
- 1. Goto the directory where you unpacked Spark and type ```sbin/start-all.sh``` to locally start Spark
- 1. Open a browser, point it to [http://localhost:8080/](http://localhost:8080/) to check whether Spark successfully started. Copy the url of the spark master at the top of the page (it starts with **spark://**)
- 1. Define the following environment variables: <pre class="codehilite">export MAHOUT_HOME=[directory into which you checked out Mahout]
-export SPARK_HOME=[directory where you unpacked Spark]
-export MASTER=[url of the Spark master]
-</pre>
- 1. Finally, change to the directory where you unpacked Mahout and type ```bin/mahout spark-shell```, 
-you should see the shell starting and get the prompt ```mahout> ```. Check 
-[FAQ](http://mahout.apache.org/users/sparkbindings/faq.html) for further troubleshooting.
-
-## Implementation
-
-We'll use the shell to interactively play with the data and incrementally implement a simple [linear regression](https://en.wikipedia.org/wiki/Linear_regression) algorithm. Let's first load the dataset. Usually, we wouldn't need Mahout unless we processed a large dataset stored in a distributed filesystem. But for the sake of this example, we'll use our tiny toy dataset and "pretend" it was too big to fit onto a single machine.
-
-*Note: You can incrementally follow the example by copy-and-pasting the code into your running Mahout shell.*
-
-Mahout's linear algebra DSL has an abstraction called *DistributedRowMatrix (DRM)* which models a matrix that is partitioned by rows and stored in the memory of a cluster of machines. We use ```dense()``` to create a dense in-memory matrix from our toy dataset and use ```drmParallelize``` to load it into the cluster, "mimicking" a large, partitioned dataset.
-
-<div class="codehilite"><pre>
-val drmData = drmParallelize(dense(
-  (2, 2, 10.5, 10, 29.509541),  // Apple Cinnamon Cheerios
-  (1, 2, 12,   12, 18.042851),  // Cap'n'Crunch
-  (1, 1, 12,   13, 22.736446),  // Cocoa Puffs
-  (2, 1, 11,   13, 32.207582),  // Froot Loops
-  (1, 2, 12,   11, 21.871292),  // Honey Graham Ohs
-  (2, 1, 16,   8,  36.187559),  // Wheaties Honey Gold
-  (6, 2, 17,   1,  50.764999),  // Cheerios
-  (3, 2, 13,   7,  40.400208),  // Clusters
-  (3, 3, 13,   4,  45.811716)), // Great Grains Pecan
-  numPartitions = 2);
-</pre></div>
-
-Have a look at this matrix. The first four columns represent the ingredients 
-(our features) and the last column (the rating) is the target variable for 
-our regression. [Linear regression](https://en.wikipedia.org/wiki/Linear_regression) 
-assumes that the **target variable** `\(\mathbf{y}\)` is generated by the 
-linear combination of **the feature matrix** `\(\mathbf{X}\)` with the 
-**parameter vector** `\(\boldsymbol{\beta}\)` plus the
- **noise** `\(\boldsymbol{\varepsilon}\)`, summarized in the formula 
-`\(\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\)`. 
-Our goal is to find an estimate of the parameter vector 
-`\(\boldsymbol{\beta}\)` that explains the data very well.
-
-As a first step, we extract `\(\mathbf{X}\)` and `\(\mathbf{y}\)` from our data matrix. We get *X* by slicing: we take all rows (denoted by ```::```) and the first four columns, which have the ingredients in milligrams as content. Note that the result is again a DRM. The shell will not execute this code yet, it saves the history of operations and defers the execution until we really access a result. **Mahout's DSL automatically optimizes and parallelizes all operations on DRMs and runs them on Apache Spark.**
-
-<div class="codehilite"><pre>
-val drmX = drmData(::, 0 until 4)
-</pre></div>
-
-Next, we extract the target variable vector *y*, the fifth column of the data matrix. We assume this one fits into our driver machine, so we fetch it into memory using ```collect```:
-
-<div class="codehilite"><pre>
-val y = drmData.collect(::, 4)
-</pre></div>
-
-Now we are ready to think about a mathematical way to estimate the parameter vector *β*. A simple textbook approach is [ordinary least squares (OLS)](https://en.wikipedia.org/wiki/Ordinary_least_squares), which minimizes the sum of residual squares between the true target variable and the prediction of the target variable. In OLS, there is even a closed form expression for estimating `\(\boldsymbol{\beta}\)` as 
-`\(\left(\mathbf{X}^{\top}\mathbf{X}\right)^{-1}\mathbf{X}^{\top}\mathbf{y}\)`.
-
-The first thing which we compute for this is  `\(\mathbf{X}^{\top}\mathbf{X}\)`. The code for doing this in Mahout's scala DSL maps directly to the mathematical formula. The operation ```.t()``` transposes a matrix and analogous to R ```%*%``` denotes matrix multiplication.
-
-<div class="codehilite"><pre>
-val drmXtX = drmX.t %*% drmX
-</pre></div>
-
-The same is true for computing `\(\mathbf{X}^{\top}\mathbf{y}\)`. We can simply type the math in scala expressions into the shell. Here, *X* lives in the cluster, while is *y* in the memory of the driver, and the result is a DRM again.
-<div class="codehilite"><pre>
-val drmXty = drmX.t %*% y
-</pre></div>
-
-We're nearly done. The next step we take is to fetch `\(\mathbf{X}^{\top}\mathbf{X}\)` and 
-`\(\mathbf{X}^{\top}\mathbf{y}\)` into the memory of our driver machine (we are targeting 
-features matrices that are tall and skinny , 
-so we can assume that `\(\mathbf{X}^{\top}\mathbf{X}\)` is small enough 
-to fit in). Then, we provide them to an in-memory solver (Mahout provides 
-the an analog to R's ```solve()``` for that) which computes ```beta```, our 
-OLS estimate of the parameter vector `\(\boldsymbol{\beta}\)`.
-
-<div class="codehilite"><pre>
-val XtX = drmXtX.collect
-val Xty = drmXty.collect(::, 0)
-
-val beta = solve(XtX, Xty)
-</pre></div>
-
-That's it! We have a implemented a distributed linear regression algorithm 
-on Apache Spark. I hope you agree that we didn't have to worry a lot about 
-parallelization and distributed systems. The goal of Mahout's linear algebra 
-DSL is to abstract away the ugliness of programming a distributed system 
-as much as possible, while still retaining decent performance and 
-scalability.
-
-We can now check how well our model fits its training data. 
-First, we multiply the feature matrix `\(\mathbf{X}\)` by our estimate of 
-`\(\boldsymbol{\beta}\)`. Then, we look at the difference (via L2-norm) of 
-the target variable `\(\mathbf{y}\)` to the fitted target variable:
-
-<div class="codehilite"><pre>
-val yFitted = (drmX %*% beta).collect(::, 0)
-(y - yFitted).norm(2)
-</pre></div>
-
-We hope that we could show that Mahout's shell allows people to interactively and incrementally write algorithms. We have entered a lot of individual commands, one-by-one, until we got the desired results. We can now refactor a little by wrapping our statements into easy-to-use functions. The definition of functions follows standard scala syntax. 
-
-We put all the commands for ordinary least squares into a function ```ols```. 
-
-<div class="codehilite"><pre>
-def ols(drmX: DrmLike[Int], y: Vector) = 
-  solve(drmX.t %*% drmX, drmX.t %*% y)(::, 0)
-
-</pre></div>
-
-Note that DSL declares implicit `collect` if coersion rules require an in-core argument. Hence, we can simply
-skip explicit `collect`s. 
-
-Next, we define a function ```goodnessOfFit``` that tells how well a model fits the target variable:
-
-<div class="codehilite"><pre>
-def goodnessOfFit(drmX: DrmLike[Int], beta: Vector, y: Vector) = {
-  val fittedY = (drmX %*% beta).collect(::, 0)
-  (y - fittedY).norm(2)
-}
-</pre></div>
-
-So far we have left out an important aspect of a standard linear regression 
-model. Usually there is a constant bias term added to the model. Without 
-that, our model always crosses through the origin and we only learn the 
-right angle. An easy way to add such a bias term to our model is to add a 
-column of ones to the feature matrix `\(\mathbf{X}\)`. 
-The corresponding weight in the parameter vector will then be the bias term.
-
-Here is how we add a bias column:
-
-<div class="codehilite"><pre>
-val drmXwithBiasColumn = drmX cbind 1
-</pre></div>
-
-Now we can give the newly created DRM ```drmXwithBiasColumn``` to our model fitting method ```ols``` and see how well the resulting model fits the training data with ```goodnessOfFit```. You should see a large improvement in the result.
-
-<div class="codehilite"><pre>
-val betaWithBiasTerm = ols(drmXwithBiasColumn, y)
-goodnessOfFit(drmXwithBiasColumn, betaWithBiasTerm, y)
-</pre></div>
-
-As a further optimization, we can make use of the DSL's caching functionality. We use ```drmXwithBiasColumn``` repeatedly  as input to a computation, so it might be beneficial to cache it in memory. This is achieved by calling ```checkpoint()```. In the end, we remove it from the cache with uncache:
-
-<div class="codehilite"><pre>
-val cachedDrmX = drmXwithBiasColumn.checkpoint()
-
-val betaWithBiasTerm = ols(cachedDrmX, y)
-val goodness = goodnessOfFit(cachedDrmX, betaWithBiasTerm, y)
-
-cachedDrmX.uncache()
-
-goodness
-</pre></div>
-
-
-Liked what you saw? Checkout Mahout's overview for the [Scala and Spark bindings](https://mahout.apache.org/users/sparkbindings/home.html).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/front_page.md
----------------------------------------------------------------------
diff --git a/website/front_page.md b/website/front_page.md
new file mode 100644
index 0000000..8b109d9
--- /dev/null
+++ b/website/front_page.md
@@ -0,0 +1,163 @@
+---
+layout: page
+theme: 
+    name: mahout2
+---
+
+<div class="jumbotron">
+  <div class="container">
+    <h1>Apache Mahout</h1>
+    <p>A distributed linear algebra framework that runs on Spark, Flink, GPU's and more!<br/>
+      Use Mahout's library of machine learning algorithms or roll your own!  Use Mahout-Samsara to write matrix
+      algebra using R like syntax.  Check out our tutorials and quick start guide to get rolling.
+    </p>
+    <div class="border row">
+      <div class="col-md-12 col-sm-12 col-xs-12 text-center newBtn">
+        <a href="http://youtube.com" target="_zeppelinVideo" class="btn btn-primary btn-lg bigFingerButton" role="button">Tutorial Video</a>
+        <a href="https://github.com/apache/mahout" class="btn btn-primary btn-lg bigFingerButton" role="button">GET LATEST MAHOUT</a>
+      </div>
+    </div>
+  </div>
+</div>  
+
+<!-- 3 wide column -->
+
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+<div class="new">
+  <div class="container">
+    <h2>Latest Release</h2>
+    <span class="newMahout center-block">Apache Mahout 0.13.0</span>
+    <div class="border row">
+      <div class="border col-md-4 col-sm-4">
+        <h4>Simple and <br/>Extensible</h4>
+        <div class="viz">
+          <p>
+            Build your own algorithms using Mahouts R like interface.  See an example in this 
+            <a href="" target="_blank">demo</a>
+          </p>
+        </div>
+      </div>
+      <div class="border col-md-4 col-sm-4">
+        <h4>Support for Multiple <br/>Distributed Backends</h4>
+        <div class="multi">
+        <p>
+           Custom bindings for Spark, Flink, and H20 enable a write once run anywhere machine learning platform
+          <a class="thumbnail text-center" href="#thumb">
+            
+            <a href="https://github.com/apache/mahout" class="btn btn-primary btn-lg bigFingerButton" role="button">See more in this DEMO. (not working)</a>
+          </a> 
+        </p>
+        </div>
+      </div>
+      <div class="border col-md-4 col-sm-4">
+        <h4>Introducing Samsara an R<br/> dsl for writing ML algos</h4>
+        <div class="personal">
+        <p>
+          Use this capability to write algorithms at scale, that will run on any backend 
+        </p>
+        </div>
+      </div>
+    </div>
+    <div class="border row">
+      <div class="border col-md-4 col-sm-4">
+        <h4>Support for GPUs</h4>
+        <p>
+          Distributed GPU Matrix-Matrix and Matrix-Vector multiplication on Spark along with sparse and dense matrix GPU-backed support.
+        </p>
+      </div>
+      <div class="border col-md-4 col-sm-4">
+        <h4>Extensible Algorithms Framework</h4>
+        <p>
+           A new scikit-learn-like framework for algorithms with the goal for
+           creating a consistent API for various machine-learning algorithms
+        </p>
+      </div>
+      <div class="border col-md-4 col-sm-4">
+        <h4>0.13.1 - Future Plans</h4>
+        <p>
+          Further Native Integration 
+          * JCuda backing for In-core Matrices and CUDA solvers
+          * GPU/OpenMP Acceleration for linear solvers
+          * Scala 2.11 Support
+          * Spark 2.x Support
+        </p>
+      </div>
+    </div>
+    <div class="col-md-12 col-sm-12 col-xs-12 text-center">
+      <p style="text-align:center; margin-top: 32px; font-size: 14px; color: gray; font-weight: 200; font-style: italic; padding-bottom: 0;">See more details in 
+        <a href="tbd">0.13.0 Release Note</a>
+      </p>
+    </div>
+  </div>
+</div>
+
+      <!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+
+
+        <div class="container">
+            <div class="row">
+                <div class="col-md-12">
+                
+
+                </div>
+            </div>
+            <div class="row">
+                <div class="col-md-12">
+                    {% for post in paginator.posts %}
+                        {% include tile.html %}
+                    {% endfor %}
+
+
+                    
+                </div>
+            </div>
+        </div>
+
+
+
+<div class="new">
+  <div class="container">
+    <h2>Mahout on Twitter</h2>
+    <br/>
+    <div class="row">
+      <div class="col-md-12 col-sm-12 col-xs-12 text-center">
+        <div class='jekyll-twitter-plugin'><a class="twitter-timeline" data-width="500" data-tweet-limit="4" data-chrome="nofooter" href="https://twitter.com/ApacheMahout">Tweets by ApacheMahout</a>
+<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></div>
+      </div>
+      <div class="col-md-12 col-sm-12 col-xs-12 text-center twitterBtn">
+        <p style="text-align:center; margin-top: 32px; font-size: 12px; color: gray; font-weight: 200; font-style: italic; padding-bottom: 0;">See more tweets or</p>
+        <a href="https://twitter.com/ApacheMahout" target="_blank" class="btn btn-primary btn-lg round" role="button">
+          Follow Mahout on &nbsp;
+          <i class="fa fa-twitter fa-lg" aria-hidden="true"></i>
+        </a>
+      </div>
+    </div>
+  </div>
+  <hr>
+</div>

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/index.md
----------------------------------------------------------------------
diff --git a/website/index.md b/website/index.md
index 40ca90b..b49b37d 100644
--- a/website/index.md
+++ b/website/index.md
@@ -1,162 +1,23 @@
 ---
 layout: page
-theme: mahout2
+theme: 
+    name: mahout2
 ---
 
-<div class="jumbotron">
-  <div class="container">
-    <h1>Apache Mahout</h1>
-    <p>A distributed linear algebra framework that runs on Spark, Flink, GPU's and more!<br/>
-      Use Mahout's library of machine learning algorithms or roll your own!  Use Mahout-Samsara to write matrix
-      algebra using R like syntax.  Check out our tutorials and quick start guide to get rolling.
-    </p>
-    <div class="border row">
-      <div class="col-md-12 col-sm-12 col-xs-12 text-center newBtn">
-        <a href="http://youtube.com" target="_zeppelinVideo" class="btn btn-primary btn-lg bigFingerButton" role="button">Tutorial Video</a>
-        <a href="https://github.com/apache/mahout" class="btn btn-primary btn-lg bigFingerButton" role="button">GET LATEST MAHOUT</a>
-      </div>
-    </div>
-  </div>
-</div>  
 
-<!-- 3 wide column -->
 
-<!--
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
+## Mathematically Expressive Scala DSL
 
-http://www.apache.org/licenses/LICENSE-2.0
+Talk about it a bit
 
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
+## Multiple Distributed Backends
 
-<div class="new">
-  <div class="container">
-    <h2>Latest Release</h2>
-    <span class="newMahout center-block">Apache Mahout 0.13.0</span>
-    <div class="border row">
-      <div class="border col-md-4 col-sm-4">
-        <h4>Simple and <br/>Extensible</h4>
-        <div class="viz">
-          <p>
-            Build your own algorithms using Mahouts R like interface.  See an example in this 
-            <a href="" target="_blank">demo</a>
-          </p>
-        </div>
-      </div>
-      <div class="border col-md-4 col-sm-4">
-        <h4>Support for Multiple <br/>Distributed Backends</h4>
-        <div class="multi">
-        <p>
-           Custom bindings for Spark, Flink, and H20 enable a write once run anywhere machine learning platform
-          <a class="thumbnail text-center" href="#thumb">
-            
-            <a href="https://github.com/apache/mahout" class="btn btn-primary btn-lg bigFingerButton" role="button">See more in this DEMO. (not working)</a>
-          </a> 
-        </p>
-        </div>
-      </div>
-      <div class="border col-md-4 col-sm-4">
-        <h4>Introducing Samsara an R<br/> dsl for writing ML algos</h4>
-        <div class="personal">
-        <p>
-          Use this capability to write algorithms at scale, that will run on any backend 
-        </p>
-        </div>
-      </div>
-    </div>
-    <div class="border row">
-      <div class="border col-md-4 col-sm-4">
-        <h4>Support for GPUs</h4>
-        <p>
-          Distributed GPU Matrix-Matrix and Matrix-Vector multiplication on Spark along with sparse and dense matrix GPU-backed support.
-        </p>
-      </div>
-      <div class="border col-md-4 col-sm-4">
-        <h4>Extensible Algorithms Framework</h4>
-        <p>
-           A new scikit-learn-like framework for algorithms with the goal for
-           creating a consistent API for various machine-learning algorithms
-        </p>
-      </div>
-      <div class="border col-md-4 col-sm-4">
-        <h4>0.13.1 - Future Plans</h4>
-        <p>
-          Further Native Integration 
-          * JCuda backing for In-core Matrices and CUDA solvers
-          * GPU/OpenMP Acceleration for linear solvers
-          * Scala 2.11 Support
-          * Spark 2.x Support
-        </p>
-      </div>
-    </div>
-    <div class="col-md-12 col-sm-12 col-xs-12 text-center">
-      <p style="text-align:center; margin-top: 32px; font-size: 14px; color: gray; font-weight: 200; font-style: italic; padding-bottom: 0;">See more details in 
-        <a href="tbd">0.13.0 Release Note</a>
-      </p>
-    </div>
-  </div>
-</div>
+I feel like this has all been written before and we can copy paste from somewhere...
 
-      <!--
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
+## Native Solver Accleration
 
-http://www.apache.org/licenses/LICENSE-2.0
+## Precanned Algorithms or Roll Your Own
 
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
+## Visualization With Apache Zeppelin
 
-
-
-        <div class="container">
-            <div class="row">
-                <div class="col-md-12">
-                
-
-                </div>
-            </div>
-            <div class="row">
-                <div class="col-md-12">
-                    {% for post in paginator.posts %}
-                        {% include tile.html %}
-                    {% endfor %}
-
-
-                    
-                </div>
-            </div>
-        </div>
-
-
-
-<div class="new">
-  <div class="container">
-    <h2>Mahout on Twitter</h2>
-    <br/>
-    <div class="row">
-      <div class="col-md-12 col-sm-12 col-xs-12 text-center">
-        <div class='jekyll-twitter-plugin'><a class="twitter-timeline" data-width="500" data-tweet-limit="4" data-chrome="nofooter" href="https://twitter.com/ApacheMahout">Tweets by ApacheMahout</a>
-<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></div>
-      </div>
-      <div class="col-md-12 col-sm-12 col-xs-12 text-center twitterBtn">
-        <p style="text-align:center; margin-top: 32px; font-size: 12px; color: gray; font-weight: 200; font-style: italic; padding-bottom: 0;">See more tweets or</p>
-        <a href="https://twitter.com/ApacheMahout" target="_blank" class="btn btn-primary btn-lg round" role="button">
-          Follow Mahout on &nbsp;
-          <i class="fa fa-twitter fa-lg" aria-hidden="true"></i>
-        </a>
-      </div>
-    </div>
-  </div>
-  <hr>
-</div>
+## Other Highlight, though 6 is probably the max (and 5 is better)
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/programming_guide/algorithms/d-als.md
----------------------------------------------------------------------
diff --git a/website/programming_guide/algorithms/d-als.md b/website/programming_guide/algorithms/d-als.md
new file mode 100644
index 0000000..ba06ceb
--- /dev/null
+++ b/website/programming_guide/algorithms/d-als.md
@@ -0,0 +1,58 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara Distributed ALS
+theme:
+    name: mahout2
+---
+# Distributed Cholesky QR
+
+
+## Intro
+
+Mahout has a distributed implementation of QR decomposition for tall thin matricies[1].
+
+## Algorithm 
+
+For the classic QR decomposition of the form `\(\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times n}\)` a distributed version is fairly easily achieved if `\(\mathbf{A}\)` is tall and thin such that `\(\mathbf{A}^{\top}\mathbf{A}\)` fits in memory, i.e. *m* is large but *n* < ~5000 Under such circumstances, only `\(\mathbf{A}\)` and `\(\mathbf{Q}\)` are distributed matricies and `\(\mathbf{A^{\top}A}\)` and `\(\mathbf{R}\)` are in-core products. We just compute the in-core version of the Cholesky decomposition in the form of `\(\mathbf{LL}^{\top}= \mathbf{A}^{\top}\mathbf{A}\)`.  After that we take `\(\mathbf{R}= \mathbf{L}^{\top}\)` and `\(\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}\)`.  The latter is easily achieved by multiplying each verticle block of `\(\mathbf{A}\)` by `\(\left(\mathbf{L}^{\top}\right)^{-1}\)`.  (There is no actual matrix inversion happening). 
+
+
+
+## Implementation
+
+Mahout `dqrThin(...)` is implemented in the mahout `math-scala` algebraic optimizer which translates Mahout's R-like linear algebra operators into a physical plan for both Spark and H2O distributed engines.
+
+    def dqrThin[K: ClassTag](A: DrmLike[K], checkRankDeficiency: Boolean = true): (DrmLike[K], Matrix) = {        
+        if (drmA.ncol > 5000)
+            log.warn("A is too fat. A'A must fit in memory and easily broadcasted.")
+        implicit val ctx = drmA.context
+        val AtA = (drmA.t %*% drmA).checkpoint()
+        val inCoreAtA = AtA.collect
+        val ch = chol(inCoreAtA)
+        val inCoreR = (ch.getL cloned) t
+        if (checkRankDeficiency && !ch.isPositiveDefinite)
+            throw new IllegalArgumentException("R is rank-deficient.")
+        val bcastAtA = sc.broadcast(inCoreAtA)
+        val Q = A.mapBlock() {
+            case (keys, block) => keys -> chol(bcastAtA).solveRight(block)
+        }
+        Q -> inCoreR
+    }
+
+
+## Usage
+
+The scala `dqrThin(...)` method can easily be called in any Spark or H2O application built with the `math-scala` library and the corresponding `Spark` or `H2O` engine module as follows:
+
+    import org.apache.mahout.math._
+    import decompositions._
+    import drm._
+    
+    val(drmQ, inCoreR) = dqrThin(drma)
+
+ 
+## References
+
+[1]: [Mahout Scala and Mahout Spark Bindings for Linear Algebra Subroutines](http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf)
+
+[2]: [Mahout Spark and Scala Bindings](http://mahout.apache.org/users/sparkbindings/home.html)
+

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/programming_guide/algorithms/d-qr.md
----------------------------------------------------------------------
diff --git a/website/programming_guide/algorithms/d-qr.md b/website/programming_guide/algorithms/d-qr.md
new file mode 100644
index 0000000..73aa2f3
--- /dev/null
+++ b/website/programming_guide/algorithms/d-qr.md
@@ -0,0 +1,58 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara DQR
+theme:
+    name: mahout2
+---
+# Distributed Cholesky QR
+
+
+## Intro
+
+Mahout has a distributed implementation of QR decomposition for tall thin matrices[1].
+
+## Algorithm 
+
+For the classic QR decomposition of the form `\(\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times n}\)` a distributed version is fairly easily achieved if `\(\mathbf{A}\)` is tall and thin such that `\(\mathbf{A}^{\top}\mathbf{A}\)` fits in memory, i.e. *m* is large but *n* < ~5000 Under such circumstances, only `\(\mathbf{A}\)` and `\(\mathbf{Q}\)` are distributed matrices and `\(\mathbf{A^{\top}A}\)` and `\(\mathbf{R}\)` are in-core products. We just compute the in-core version of the Cholesky decomposition in the form of `\(\mathbf{LL}^{\top}= \mathbf{A}^{\top}\mathbf{A}\)`.  After that we take `\(\mathbf{R}= \mathbf{L}^{\top}\)` and `\(\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}\)`.  The latter is easily achieved by multiplying each vertical block of `\(\mathbf{A}\)` by `\(\left(\mathbf{L}^{\top}\right)^{-1}\)`.  (There is no actual matrix inversion happening). 
+
+
+
+## Implementation
+
+Mahout `dqrThin(...)` is implemented in the mahout `math-scala` algebraic optimizer which translates Mahout's R-like linear algebra operators into a physical plan for both Spark and H2O distributed engines.
+
+    def dqrThin[K: ClassTag](A: DrmLike[K], checkRankDeficiency: Boolean = true): (DrmLike[K], Matrix) = {        
+        if (drmA.ncol > 5000)
+            log.warn("A is too fat. A'A must fit in memory and easily broadcasted.")
+        implicit val ctx = drmA.context
+        val AtA = (drmA.t %*% drmA).checkpoint()
+        val inCoreAtA = AtA.collect
+        val ch = chol(inCoreAtA)
+        val inCoreR = (ch.getL cloned) t
+        if (checkRankDeficiency && !ch.isPositiveDefinite)
+            throw new IllegalArgumentException("R is rank-deficient.")
+        val bcastAtA = sc.broadcast(inCoreAtA)
+        val Q = A.mapBlock() {
+            case (keys, block) => keys -> chol(bcastAtA).solveRight(block)
+        }
+        Q -> inCoreR
+    }
+
+
+## Usage
+
+The scala `dqrThin(...)` method can easily be called in any Spark or H2O application built with the `math-scala` library and the corresponding `Spark` or `H2O` engine module as follows:
+
+    import org.apache.mahout.math._
+    import decompositions._
+    import drm._
+    
+    val(drmQ, inCoreR) = dqrThin(drma)
+
+ 
+## References
+
+[1]: [Mahout Scala and Mahout Spark Bindings for Linear Algebra Subroutines](http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf)
+
+[2]: [Mahout Spark and Scala Bindings](http://mahout.apache.org/users/sparkbindings/home.html)
+

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/programming_guide/algorithms/d-spca.md
----------------------------------------------------------------------
diff --git a/website/programming_guide/algorithms/d-spca.md b/website/programming_guide/algorithms/d-spca.md
new file mode 100644
index 0000000..c0026b4
--- /dev/null
+++ b/website/programming_guide/algorithms/d-spca.md
@@ -0,0 +1,175 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara Dist Sto PCA
+theme:
+    name: mahout2
+---
+# Distributed Stochastic PCA
+
+
+## Intro
+
+Mahout has a distributed implementation of Stochastic PCA[1]. This algorithm computes the exact equivalent of Mahout's dssvd(`\(\mathbf{A-1\mu^\top}\)`) by modifying the `dssvd` algorithm so as to avoid forming `\(\mathbf{A-1\mu^\top}\)`, which would densify a sparse input. Thus, it is suitable for work with both dense and sparse inputs.
+
+## Algorithm
+
+Given an *m* `\(\times\)` *n* matrix `\(\mathbf{A}\)`, a target rank *k*, and an oversampling parameter *p*, this procedure computes a *k*-rank PCA by finding the unknowns in `\(\mathbf{A−1\mu^\top \approx U\Sigma V^\top}\)`:
+
+1. Create seed for random *n* `\(\times\)` *(k+p)* matrix `\(\Omega\)`.
+2. `\(\mathbf{s_\Omega \leftarrow \Omega^\top \mu}\)`.
+3. `\(\mathbf{Y_0 \leftarrow A\Omega − 1 {s_\Omega}^\top, Y \in \mathbb{R}^{m\times(k+p)}}\)`.
+4. Column-orthonormalize `\(\mathbf{Y_0} \rightarrow \mathbf{Q}\)` by computing thin decomposition `\(\mathbf{Y_0} = \mathbf{QR}\)`. Also, `\(\mathbf{Q}\in\mathbb{R}^{m\times(k+p)}, \mathbf{R}\in\mathbb{R}^{(k+p)\times(k+p)}\)`.
+5. `\(\mathbf{s_Q \leftarrow Q^\top 1}\)`.
+6. `\(\mathbf{B_0 \leftarrow Q^\top A: B \in \mathbb{R}^{(k+p)\times n}}\)`.
+7. `\(\mathbf{s_B \leftarrow {B_0}^\top \mu}\)`.
+8. For *i* in 1..*q* repeat (power iterations):
+    - For *j* in 1..*n* apply `\(\mathbf{(B_{i−1})_{∗j} \leftarrow (B_{i−1})_{∗j}−\mu_j s_Q}\)`.
+    - `\(\mathbf{Y_i \leftarrow A{B_{i−1}}^\top−1(s_B−\mu^\top \mu s_Q)^\top}\)`.
+    - Column-orthonormalize `\(\mathbf{Y_i} \rightarrow \mathbf{Q}\)` by computing thin decomposition `\(\mathbf{Y_i = QR}\)`.
+    - `\(\mathbf{s_Q \leftarrow Q^\top 1}\)`.
+    - `\(\mathbf{B_i \leftarrow Q^\top A}\)`.
+    - `\(\mathbf{s_B \leftarrow {B_i}^\top \mu}\)`.
+9. Let `\(\mathbf{C \triangleq s_Q {s_B}^\top}\)`. `\(\mathbf{M \leftarrow B_q {B_q}^\top − C − C^\top + \mu^\top \mu s_Q {s_Q}^\top}\)`.
+10. Compute an eigensolution of the small symmetric `\(\mathbf{M = \hat{U} \Lambda \hat{U}^\top: M \in \mathbb{R}^{(k+p)\times(k+p)}}\)`.
+11. The singular values `\(\Sigma = \Lambda^{\circ 0.5}\)`, or, in other words, `\(\mathbf{\sigma_i= \sqrt{\lambda_i}}\)`.
+12. If needed, compute `\(\mathbf{U = Q\hat{U}}\)`.
+13. If needed, compute `\(\mathbf{V = B^\top \hat{U} \Sigma^{−1}}\)`.
+14. If needed, items converted to the PCA space can be computed as `\(\mathbf{U\Sigma}\)`.
+
+## Implementation
+
+Mahout `dspca(...)` is implemented in the mahout `math-scala` algebraic optimizer which translates Mahout's R-like linear algebra operators into a physical plan for both Spark and H2O distributed engines.
+
+    def dspca[K](drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0): 
+    (DrmLike[K], DrmLike[Int], Vector) = {
+
+        // Some mapBlock() calls need it
+        implicit val ktag =  drmA.keyClassTag
+
+        val drmAcp = drmA.checkpoint()
+        implicit val ctx = drmAcp.context
+
+        val m = drmAcp.nrow
+    	val n = drmAcp.ncol
+        assert(k <= (m min n), "k cannot be greater than smaller of m, n.")
+        val pfxed = safeToNonNegInt((m min n) - k min p)
+
+        // Actual decomposition rank
+        val r = k + pfxed
+
+        // Dataset mean
+        val mu = drmAcp.colMeans
+
+        val mtm = mu dot mu
+
+        // We represent Omega by its seed.
+        val omegaSeed = RandomUtils.getRandom().nextInt()
+        val omega = Matrices.symmetricUniformView(n, r, omegaSeed)
+
+        // This done in front in a single-threaded fashion for now. Even though it doesn't require any
+        // memory beyond that is required to keep xi around, it still might be parallelized to backs
+        // for significantly big n and r. TODO
+        val s_o = omega.t %*% mu
+
+        val bcastS_o = drmBroadcast(s_o)
+        val bcastMu = drmBroadcast(mu)
+
+        var drmY = drmAcp.mapBlock(ncol = r) {
+            case (keys, blockA) ⇒
+                val s_o:Vector = bcastS_o
+                val blockY = blockA %*% Matrices.symmetricUniformView(n, r, omegaSeed)
+                for (row ← 0 until blockY.nrow) blockY(row, ::) -= s_o
+                keys → blockY
+        }
+                // Checkpoint Y
+                .checkpoint()
+
+        var drmQ = dqrThin(drmY, checkRankDeficiency = false)._1.checkpoint()
+
+        var s_q = drmQ.colSums()
+        var bcastVarS_q = drmBroadcast(s_q)
+
+        // This actually should be optimized as identically partitioned map-side A'B since A and Q should
+        // still be identically partitioned.
+        var drmBt = (drmAcp.t %*% drmQ).checkpoint()
+
+        var s_b = (drmBt.t %*% mu).collect(::, 0)
+        var bcastVarS_b = drmBroadcast(s_b)
+
+        for (i ← 0 until q) {
+
+            // These closures don't seem to live well with outside-scope vars. This doesn't record closure
+            // attributes correctly. So we create additional set of vals for broadcast vars to properly
+            // create readonly closure attributes in this very scope.
+            val bcastS_q = bcastVarS_q
+            val bcastMuInner = bcastMu
+
+            // Fix Bt as B' -= xi cross s_q
+            drmBt = drmBt.mapBlock() {
+                case (keys, block) ⇒
+                    val s_q: Vector = bcastS_q
+                    val mu: Vector = bcastMuInner
+                    keys.zipWithIndex.foreach {
+                        case (key, idx) ⇒ block(idx, ::) -= s_q * mu(key)
+                    }
+                    keys → block
+            }
+
+            drmY.uncache()
+            drmQ.uncache()
+
+            val bCastSt_b = drmBroadcast(s_b -=: mtm * s_q)
+
+            drmY = (drmAcp %*% drmBt)
+                // Fix Y by subtracting st_b from each row of the AB'
+                .mapBlock() {
+                case (keys, block) ⇒
+                    val st_b: Vector = bCastSt_b
+                    block := { (_, c, v) ⇒ v - st_b(c) }
+                    keys → block
+            }
+            // Checkpoint Y
+            .checkpoint()
+
+            drmQ = dqrThin(drmY, checkRankDeficiency = false)._1.checkpoint()
+
+            s_q = drmQ.colSums()
+            bcastVarS_q = drmBroadcast(s_q)
+
+            // This on the other hand should be inner-join-and-map A'B optimization since A and Q_i are not
+            // identically partitioned anymore.
+            drmBt = (drmAcp.t %*% drmQ).checkpoint()
+
+            s_b = (drmBt.t %*% mu).collect(::, 0)
+            bcastVarS_b = drmBroadcast(s_b)
+        }
+
+        val c = s_q cross s_b
+        val inCoreBBt = (drmBt.t %*% drmBt).checkpoint(CacheHint.NONE).collect -=:
+            c -=: c.t +=: mtm *=: (s_q cross s_q)
+        val (inCoreUHat, d) = eigen(inCoreBBt)
+        val s = d.sqrt
+
+        // Since neither drmU nor drmV are actually computed until actually used, we don't need the flags
+        // instructing compute (or not compute) either of the U,V outputs anymore. Neat, isn't it?
+        val drmU = drmQ %*% inCoreUHat
+        val drmV = drmBt %*% (inCoreUHat %*% diagv(1 / s))
+
+        (drmU(::, 0 until k), drmV(::, 0 until k), s(0 until k))
+    }
+
+## Usage
+
+The scala `dspca(...)` method can easily be called in any Spark, Flink, or H2O application built with the `math-scala` library and the corresponding `Spark`, `Flink`, or `H2O` engine module as follows:
+
+    import org.apache.mahout.math._
+    import decompositions._
+    import drm._
+    
+    val (drmU, drmV, s) = dspca(drmA, k=200, q=1)
+
+Note the parameter is optional and its default value is zero.
+ 
+## References
+
+[1]: Lyubimov and Palumbo, ["Apache Mahout: Beyond MapReduce; Distributed Algorithm Design"](https://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785)

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/programming_guide/algorithms/d-ssvd.md
----------------------------------------------------------------------
diff --git a/website/programming_guide/algorithms/d-ssvd.md b/website/programming_guide/algorithms/d-ssvd.md
new file mode 100644
index 0000000..71cd977
--- /dev/null
+++ b/website/programming_guide/algorithms/d-ssvd.md
@@ -0,0 +1,142 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara DSSVD
+theme:
+    name: mahout2
+---
+# Distributed Stochastic Singular Value Decomposition
+
+
+## Intro
+
+Mahout has a distributed implementation of Stochastic Singular Value Decomposition [1] using the parallelization strategy comprehensively defined in Nathan Halko's dissertation ["Randomized methods for computing low-rank approximations of matrices"](http://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf) [2].
+
+## Modified SSVD Algorithm
+
+Given an `\(m\times n\)`
+matrix `\(\mathbf{A}\)`, a target rank `\(k\in\mathbb{N}_{1}\)`
+, an oversampling parameter `\(p\in\mathbb{N}_{1}\)`, 
+and the number of additional power iterations `\(q\in\mathbb{N}_{0}\)`, 
+this procedure computes an `\(m\times\left(k+p\right)\)`
+SVD `\(\mathbf{A\approx U}\boldsymbol{\Sigma}\mathbf{V}^{\top}\)`:
+
+  1. Create seed for random `\(n\times\left(k+p\right)\)`
+  matrix `\(\boldsymbol{\Omega}\)`. The seed defines matrix `\(\mathbf{\Omega}\)`
+  using Gaussian unit vectors per one of suggestions in [Halko, Martinsson, Tropp].
+
+  2. `\(\mathbf{Y=A\boldsymbol{\Omega}},\,\mathbf{Y}\in\mathbb{R}^{m\times\left(k+p\right)}\)`
+ 
+  3. Column-orthonormalize `\(\mathbf{Y}\rightarrow\mathbf{Q}\)`
+  by computing thin decomposition `\(\mathbf{Y}=\mathbf{Q}\mathbf{R}\)`.
+  Also, `\(\mathbf{Q}\in\mathbb{R}^{m\times\left(k+p\right)},\,\mathbf{R}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)`; denoted as `\(\mathbf{Q}=\mbox{qr}\left(\mathbf{Y}\right).\mathbf{Q}\)`
+
+  4. `\(\mathbf{B}_{0}=\mathbf{Q}^{\top}\mathbf{A}:\,\,\mathbf{B}\in\mathbb{R}^{\left(k+p\right)\times n}\)`.
+ 
+  5. If `\(q>0\)`
+  repeat: for `\(i=1..q\)`: 
+  `\(\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\mathbf{B}_{i-1}^{\top}\right).\mathbf{Q}\)`
+  (power iterations step).
+
+  6. Compute Eigensolution of a small Hermitian `\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}=\mathbf{\hat{U}}\boldsymbol{\Lambda}\mathbf{\hat{U}}^{\top}\)`,
+  `\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)`.
+ 
+  7. Singular values `\(\mathbf{\boldsymbol{\Sigma}}=\boldsymbol{\Lambda}^{0.5}\)`,
+  or, in other words, `\(s_{i}=\sqrt{\sigma_{i}}\)`.
+ 
+  8. If needed, compute `\(\mathbf{U}=\mathbf{Q}\hat{\mathbf{U}}\)`.
+
+  9. If needed, compute `\(\mathbf{V}=\mathbf{B}_{q}^{\top}\hat{\mathbf{U}}\boldsymbol{\Sigma}^{-1}\)`.
+Another way is `\(\mathbf{V}=\mathbf{A}^{\top}\mathbf{U}\boldsymbol{\Sigma}^{-1}\)`.
+
+
+
+
+## Implementation
+
+Mahout `dssvd(...)` is implemented in the mahout `math-scala` algebraic optimizer which translates Mahout's R-like linear algebra operators into a physical plan for both Spark and H2O distributed engines.
+
+    def dssvd[K: ClassTag](drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0):
+        (DrmLike[K], DrmLike[Int], Vector) = {
+
+        val drmAcp = drmA.checkpoint()
+
+        val m = drmAcp.nrow
+        val n = drmAcp.ncol
+        assert(k <= (m min n), "k cannot be greater than smaller of m, n.")
+        val pfxed = safeToNonNegInt((m min n) - k min p)
+
+        // Actual decomposition rank
+        val r = k + pfxed
+
+        // We represent Omega by its seed.
+        val omegaSeed = RandomUtils.getRandom().nextInt()
+
+        // Compute Y = A*Omega.  
+        var drmY = drmAcp.mapBlock(ncol = r) {
+            case (keys, blockA) =>
+                val blockY = blockA %*% Matrices.symmetricUniformView(n, r, omegaSeed)
+            keys -> blockY
+        }
+
+        var drmQ = dqrThin(drmY.checkpoint())._1
+
+        // Checkpoint Q if last iteration
+        if (q == 0) drmQ = drmQ.checkpoint()
+
+        var drmBt = drmAcp.t %*% drmQ
+        
+        // Checkpoint B' if last iteration
+        if (q == 0) drmBt = drmBt.checkpoint()
+
+        for (i <- 0  until q) {
+            drmY = drmAcp %*% drmBt
+            drmQ = dqrThin(drmY.checkpoint())._1            
+            
+            // Checkpoint Q if last iteration
+            if (i == q - 1) drmQ = drmQ.checkpoint()
+            
+            drmBt = drmAcp.t %*% drmQ
+            
+            // Checkpoint B' if last iteration
+            if (i == q - 1) drmBt = drmBt.checkpoint()
+        }
+
+        val (inCoreUHat, d) = eigen(drmBt.t %*% drmBt)
+        val s = d.sqrt
+
+        // Since neither drmU nor drmV are actually computed until actually used
+        // we don't need the flags instructing compute (or not compute) either of the U,V outputs 
+        val drmU = drmQ %*% inCoreUHat
+        val drmV = drmBt %*% (inCoreUHat %*%: diagv(1 /: s))
+
+        (drmU(::, 0 until k), drmV(::, 0 until k), s(0 until k))
+    }
+
+Note: As a side effect of checkpointing, U and V values are returned as logical operators (i.e. they are neither checkpointed nor computed).  Therefore there is no physical work actually done to compute `\(\mathbf{U}\)` or `\(\mathbf{V}\)` until they are used in a subsequent expression.
+
+
+## Usage
+
+The scala `dssvd(...)` method can easily be called in any Spark or H2O application built with the `math-scala` library and the corresponding `Spark` or `H2O` engine module as follows:
+
+    import org.apache.mahout.math._
+    import decompositions._
+    import drm._
+    
+    
+    val(drmU, drmV, s) = dssvd(drma, k = 40, q = 1)
+
+ 
+## References
+
+[1]: [Mahout Scala and Mahout Spark Bindings for Linear Algebra Subroutines](http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf)
+
+[2]: [Randomized methods for computing low-rank
+approximations of matrices](http://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf)
+
+[2]: [Halko, Martinsson, Tropp](http://arxiv.org/abs/0909.4061)
+
+[3]: [Mahout Spark and Scala Bindings](http://mahout.apache.org/users/sparkbindings/home.html)
+
+
+

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/programming_guide/distributed/flink-bindings.md
----------------------------------------------------------------------
diff --git a/website/programming_guide/distributed/flink-bindings.md b/website/programming_guide/distributed/flink-bindings.md
new file mode 100644
index 0000000..d73cba3
--- /dev/null
+++ b/website/programming_guide/distributed/flink-bindings.md
@@ -0,0 +1,49 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara Flink
+theme:
+    name: mahout2
+---
+#Introduction
+
+This document provides an overview of how the Mahout Samsara environment is implemented over the Apache Flink backend engine. This document gives an overview of the code layout for the Flink backend engine, the source code for which can be found under /flink directory in the Mahout codebase.
+
+Apache Flink is a distributed big data streaming engine that supports both Streaming and Batch interfaces. Batch processing is an extension of Flink’s Stream processing engine.
+
+The Mahout Flink integration presently supports Flink’s batch processing capabilities leveraging the DataSet API.
+
+The Mahout DRM, or Distributed Row Matrix, is an abstraction for storing a large matrix of numbers in-memory in a cluster by distributing logical rows among servers. Mahout's scala DSL provides an abstract API on DRMs for backend engines to provide implementations of this API. An example is the Spark backend engine. Each engine has it's own design of mapping the abstract API onto its data model and provides implementations for algebraic operators over that mapping.
+
+#Flink Overview
+
+Apache Flink is an open source, distributed Stream and Batch Processing Framework. At it's core, Flink is a Stream Processing engine and Batch processing is an extension of Stream Processing. 
+
+Flink includes several APIs for building applications with the Flink Engine:
+
+ <ol>
+<li><b>DataSet API</b> for Batch data in Java, Scala and Python</li>
+<li><b>DataStream API</b> for Stream Processing in Java and Scala</li>
+<li><b>Table API</b> with SQL-like regular expression language in Java and Scala</li>
+<li><b>Gelly</b> Graph Processing API in Java and Scala</li>
+<li><b>CEP API</b>, a complex event processing library</li>
+<li><b>FlinkML</b>, a Machine Learning library</li>
+</ol>
+#Flink Environment Engine
+
+The Flink backend implements the abstract DRM as a Flink DataSet. A Flink job runs in the context of an ExecutionEnvironment (from the Flink Batch processing API).
+
+#Source Layout
+
+Within mahout.git, the top level directory, flink/ holds all the source code for the Flink backend engine. Sections of code that interface with the rest of the Mahout components are in Scala, and sections of the code that interface with Flink DataSet API and implement algebraic operators are in Java. Here is a brief overview of what functionality can be found within flink/ folder.
+
+flink/ - top level directory containing all Flink related code
+
+flink/src/main/scala/org/apache/mahout/flinkbindings/blas/*.scala - Physical operator code for the Samsara DSL algebra
+
+flink/src/main/scala/org/apache/mahout/flinkbindings/drm/*.scala - Flink Dataset DRM and broadcast implementation
+
+flink/src/main/scala/org/apache/mahout/flinkbindings/io/*.scala - Read / Write between DRMDataSet and files on HDFS
+
+flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala - DSL operator graph evaluator and various abstract API implementations for a distributed engine.
+
+

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/programming_guide/distributed/h2o-internals.md
----------------------------------------------------------------------
diff --git a/website/programming_guide/distributed/h2o-internals.md b/website/programming_guide/distributed/h2o-internals.md
new file mode 100644
index 0000000..932177a
--- /dev/null
+++ b/website/programming_guide/distributed/h2o-internals.md
@@ -0,0 +1,50 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara H20
+theme:
+    name: mahout2
+---
+# Introduction
+ 
+This document provides an overview of how the Mahout Samsara environment is implemented over the H2O backend engine. The document is aimed at Mahout developers, to give a high level description of the design so that one can explore the code inside `h2o/` with some context.
+
+## H2O Overview
+
+H2O is a distributed scalable machine learning system. Internal architecture of H2O has a distributed math engine (h2o-core) and a separate layer on top for algorithms and UI. The Mahout integration requires only the math engine (h2o-core).
+
+## H2O Data Model
+
+The data model of the H2O math engine is a distributed columnar store (of primarily numbers, but also strings). A column of numbers is called a Vector, which is broken into Chunks (of a few thousand elements). Chunks are distributed across the cluster based on a deterministic hash. Therefore, any member of the cluster knows where a particular Chunk of a Vector is homed. Each Chunk is separately compressed in memory and elements are individually decompressed on the fly upon access with purely register operations (thereby achieving high memory throughput). An ordered set of similarly partitioned Vecs are composed into a Frame. A Frame is therefore a large two dimensional table of numbers. All elements of a logical row in the Frame are guaranteed to be homed in the same server of the cluster. Generally speaking, H2O works well on "tall skinny" data, i.e, lots of rows (100s of millions) and modest number of columns (10s of thousands).
+
+
+## Mahout DRM
+
+The Mahout DRM, or Distributed Row Matrix, is an abstraction for storing a large matrix of numbers in-memory in a cluster by distributing logical rows among servers. Mahout's scala DSL provides an abstract API on DRMs for backend engines to provide implementations of this API. Examples are the Spark and H2O backend engines. Each engine has it's own design of mapping the abstract API onto its data model and provides implementations for algebraic operators over that mapping.
+
+
+## H2O Environment Engine
+
+The H2O backend implements the abstract DRM as an H2O Frame. Each logical column in the DRM is an H2O Vector. All elements of a logical DRM row are guaranteed to be homed on the same server. A set of rows stored on a server are presented as a read-only virtual in-core Matrix (i.e BlockMatrix) for the closure method in the `mapBlock(...)` API.
+
+H2O provides a flexible execution framework called `MRTask`. The `MRTask` framework typically executes over a Frame (or even a Vector), supports various types of map() methods, can optionally modify the Frame or Vector (though this never happens in the Mahout integration), and optionally create a new Vector or set of Vectors (to combine them into a new Frame, and consequently a new DRM).
+
+
+## Source Layout
+
+Within mahout.git, the top level directory, `h2o/` holds all the source code related to the H2O backend engine. Part of the code (that interfaces with the rest of the Mahout componenets) is in Scala, and part of the code (that interfaces with h2o-core and implements algebraic operators) is in Java. Here is a brief overview of what functionality can be found where within `h2o/`.
+
+  h2o/ - top level directory containing all H2O related code
+
+  h2o/src/main/java/org/apache/mahout/h2obindings/ops/*.java - Physical operator code for the various DSL algebra
+
+  h2o/src/main/java/org/apache/mahout/h2obindings/drm/*.java - DRM backing (onto Frame) and Broadcast implementation
+
+  h2o/src/main/java/org/apache/mahout/h2obindings/H2OHdfs.java - Read / Write between DRM (Frame) and files on HDFS
+
+  h2o/src/main/java/org/apache/mahout/h2obindings/H2OBlockMatrix.java - A vertical block matrix of DRM presented as a virtual copy-on-write in-core Matrix. Used in mapBlock() API
+
+  h2o/src/main/java/org/apache/mahout/h2obindings/H2OHelper.java - A collection of various functionality and helpers. For e.g, convert between in-core Matrix and DRM, various summary statistics on DRM/Frame.
+
+  h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala - DSL operator graph evaluator and various abstract API implementations for a distributed engine
+
+  h2o/src/main/scala/org/apache/mahout/h2obindings/* - Various abstract API implementations ("glue work")
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/programming_guide/distributed/spark-bindings.md
----------------------------------------------------------------------
diff --git a/website/programming_guide/distributed/spark-bindings.md b/website/programming_guide/distributed/spark-bindings.md
new file mode 100644
index 0000000..89b18de
--- /dev/null
+++ b/website/programming_guide/distributed/spark-bindings.md
@@ -0,0 +1,101 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara Spark
+theme:
+    name: mahout2
+---
+
+# Scala & Spark Bindings:
+*Bringing algebraic semantics*
+
+## What is Scala & Spark Bindings?
+
+In short, Scala & Spark Bindings for Mahout is Scala DSL and algebraic optimizer of something like this (actual formula from **(d)spca**)
+        
+
+`\[\mathbf{G}=\mathbf{B}\mathbf{B}^{\top}-\mathbf{C}-\mathbf{C}^{\top}+\mathbf{s}_{q}\mathbf{s}_{q}^{\top}\boldsymbol{\xi}^{\top}\boldsymbol{\xi}\]`
+
+bound to in-core and distributed computations (currently, on Apache Spark).
+
+
+Mahout Scala & Spark Bindings expression of the above:
+
+        val g = bt.t %*% bt - c - c.t + (s_q cross s_q) * (xi dot xi)
+
+The main idea is that a scientist writing algebraic expressions cannot care less of distributed 
+operation plans and works **entirely on the logical level** just like he or she would do with R.
+
+Another idea is decoupling logical expression from distributed back-end. As more back-ends are added, 
+this implies **"write once, run everywhere"**.
+
+The linear algebra side works with scalars, in-core vectors and matrices, and Mahout Distributed
+Row Matrices (DRMs).
+
+The ecosystem of operators is built in the R's image, i.e. it follows R naming such as %*%, 
+colSums, nrow, length operating over vectors or matices. 
+
+Important part of Spark Bindings is expression optimizer. It looks at expression as a whole 
+and figures out how it can be simplified, and which physical operators should be picked. For example,
+there are currently about 5 different physical operators performing DRM-DRM multiplication
+picked based on matrix geometry, distributed dataset partitioning, orientation etc. 
+If we count in DRM by in-core combinations, that would be another 4, i.e. 9 total -- all of it for just 
+simple x %*% y logical notation.
+
+
+
+Please refer to the documentation for details.
+
+## Status
+
+This environment addresses mostly R-like Linear Algebra optmizations for 
+Spark, Flink and H20.
+
+
+## Documentation
+
+* Scala and Spark bindings manual: [web](http://apache.github.io/mahout/doc/ScalaSparkBindings.html), [pdf](ScalaSparkBindings.pdf)
+* Overview blog on 0.10.x releases: [blog](http://www.weatheringthroughtechdays.com/2015/04/mahout-010x-first-mahout-release-as.html)
+
+## Distributed methods and solvers using Bindings
+
+* In-core ([ssvd]) and Distributed ([dssvd]) Stochastic SVD -- guinea pigs -- see the bindings manual
+* In-core ([spca]) and Distributed ([dspca]) Stochastic PCA -- guinea pigs -- see the bindings manual
+* Distributed thin QR decomposition ([dqrThin]) -- guinea pig -- see the bindings manual 
+* [Current list of algorithms](https://mahout.apache.org/users/basics/algorithms.html)
+
+[ssvd]: https://github.com/apache/mahout/blob/trunk/math-scala/src/main/scala/org/apache/mahout/math/scalabindings/SSVD.scala
+[spca]: https://github.com/apache/mahout/blob/trunk/math-scala/src/main/scala/org/apache/mahout/math/scalabindings/SSVD.scala
+[dssvd]: https://github.com/apache/mahout/blob/trunk/spark/src/main/scala/org/apache/mahout/sparkbindings/decompositions/DSSVD.scala
+[dspca]: https://github.com/apache/mahout/blob/trunk/spark/src/main/scala/org/apache/mahout/sparkbindings/decompositions/DSPCA.scala
+[dqrThin]: https://github.com/apache/mahout/blob/trunk/spark/src/main/scala/org/apache/mahout/sparkbindings/decompositions/DQR.scala
+
+
+## Related history of note 
+
+* CLI and Driver for Spark version of item similarity -- [MAHOUT-1541](https://issues.apache.org/jira/browse/MAHOUT-1541)
+* Command line interface for generalizable Spark pipelines -- [MAHOUT-1569](https://issues.apache.org/jira/browse/MAHOUT-1569)
+* Cooccurrence Analysis / Item-based Recommendation -- [MAHOUT-1464](https://issues.apache.org/jira/browse/MAHOUT-1464)
+* Spark Bindings -- [MAHOUT-1346](https://issues.apache.org/jira/browse/MAHOUT-1346)
+* Scala Bindings -- [MAHOUT-1297](https://issues.apache.org/jira/browse/MAHOUT-1297)
+* Interactive Scala & Spark Bindings Shell & Script processor -- [MAHOUT-1489](https://issues.apache.org/jira/browse/MAHOUT-1489)
+* OLS tutorial using Mahout shell -- [MAHOUT-1542](https://issues.apache.org/jira/browse/MAHOUT-1542)
+* Full abstraction of DRM apis and algorithms from a distributed engine -- [MAHOUT-1529](https://issues.apache.org/jira/browse/MAHOUT-1529)
+* Port Naive Bayes -- [MAHOUT-1493](https://issues.apache.org/jira/browse/MAHOUT-1493)
+
+## Work in progress 
+* Text-delimited files for input and output -- [MAHOUT-1568](https://issues.apache.org/jira/browse/MAHOUT-1568)
+<!-- * Weighted (Implicit Feedback) ALS -- [MAHOUT-1365](https://issues.apache.org/jira/browse/MAHOUT-1365) -->
+<!--* Data frame R-like bindings -- [MAHOUT-1490](https://issues.apache.org/jira/browse/MAHOUT-1490) -->
+
+* *Your issue here!*
+
+<!-- ## Stuff wanted: 
+* Data frame R-like bindings (similarly to linalg bindings)
+* Stat R-like bindings (perhaps we can just adapt to commons.math stat)
+* **BYODMs:** Bring Your Own Distributed Method on SparkBindings! 
+* In-core jBlas matrix adapter
+* In-core GPU matrix adapters -->
+
+
+
+  
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/programming_guide/mahout-samsara/faq.md
----------------------------------------------------------------------
diff --git a/website/programming_guide/mahout-samsara/faq.md b/website/programming_guide/mahout-samsara/faq.md
new file mode 100644
index 0000000..97a99cf
--- /dev/null
+++ b/website/programming_guide/mahout-samsara/faq.md
@@ -0,0 +1,51 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara
+theme:
+    name: mahout2
+---
+# FAQ for using Mahout with Spark
+
+**Q: Mahout Spark shell doesn't start; "ClassNotFound" problems or various classpath problems.**
+
+**A:** So far as of the time of this writing all reported problems starting the Spark shell in Mahout were revolving 
+around classpath issues one way or another. 
+
+If you are getting method signature like errors, most probably you have mismatch between Mahout's Spark dependency 
+and actual Spark installed. (At the time of this writing the HEAD depends on Spark 1.1.0) but check mahout/pom.xml.
+
+Troubleshooting general classpath issues is pretty straightforward. Since Mahout is using Spark's installation 
+and its classpath as reported by Spark itself for Spark-related dependencies, it is important to make sure 
+the classpath is sane and is made available to Mahout:
+
+1. Check Spark is of correct version (same as in Mahout's poms), is compiled and SPARK_HOME is set.
+2. Check Mahout is compiled and MAHOUT_HOME is set.
+3. Run `$SPARK_HOME/bin/compute-classpath.sh` and make sure it produces sane result with no errors. 
+If it outputs something other than a straightforward classpath string, most likely Spark is not compiled/set correctly (later spark versions require 
+`sbt/sbt assembly` to be run, simply runnig `sbt/sbt publish-local` is not enough any longer).
+4. Run `$MAHOUT_HOME/bin/mahout -spark classpath` and check that path reported in step (3) is included.
+
+**Q: I am using the command line Mahout jobs that run on Spark or am writing my own application that uses 
+Mahout's Spark code. When I run the code on my cluster I get ClassNotFound or signature errors during serialization. 
+What's wrong?**
+ 
+**A:** The Spark artifacts in the maven ecosystem may not match the exact binary you are running on your cluster. This may 
+cause class name or version mismatches. In this case you may wish 
+to build Spark yourself to guarantee that you are running exactly what you are building Mahout against. To do this follow these steps
+in order:
+
+1. Build Spark with maven, but **do not** use the "package" target as described on the Spark site. Build with the "clean install" target instead. 
+Something like: "mvn clean install -Dhadoop1.2.1" or whatever your particular build options are. This will put the jars for Spark
+in the local maven cache.
+2. Deploy **your** Spark build to your cluster and test it there.
+3. Build Mahout. This will cause maven to pull the jars for Spark from the local maven cache and may resolve missing 
+or mis-identified classes.
+4. if you are building your own code do so against the local builds of Spark and Mahout.
+
+**Q: The implicit SparkContext 'sc' does not work in the Mahout spark-shell.**
+
+**A:** In the Mahout spark-shell the SparkContext is called 'sdc', where the 'd' stands for distributed. 
+
+
+
+

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/programming_guide/mahout-samsara/in-core-reference.md
----------------------------------------------------------------------
diff --git a/website/programming_guide/mahout-samsara/in-core-reference.md b/website/programming_guide/mahout-samsara/in-core-reference.md
new file mode 100644
index 0000000..a949063
--- /dev/null
+++ b/website/programming_guide/mahout-samsara/in-core-reference.md
@@ -0,0 +1,303 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara In Core
+theme:
+    name: mahout2
+---
+## Mahout-Samsara's In-Core Linear Algebra DSL Reference
+
+#### Imports
+
+The following imports are used to enable Mahout-Samsara's Scala DSL bindings for in-core Linear Algebra:
+
+    import org.apache.mahout.math._
+    import scalabindings._
+    import RLikeOps._
+    
+#### Inline initalization
+
+Dense vectors:
+
+    val densVec1: Vector = (1.0, 1.1, 1.2)
+    val denseVec2 = dvec(1, 0, 1,1 ,1,2)
+
+Sparse vectors:
+
+    val sparseVec1: Vector = (5 -> 1.0) :: (10 -> 2.0) :: Nil
+    val sparseVec1 = svec((5 -> 1.0) :: (10 -> 2.0) :: Nil)
+
+    // to create a vector with specific cardinality
+    val sparseVec1 = svec((5 -> 1.0) :: (10 -> 2.0) :: Nil, cardinality = 20)
+    
+Inline matrix initialization, either sparse or dense, is always done row wise. 
+
+Dense matrices:
+
+    val A = dense((1, 2, 3), (3, 4, 5))
+    
+Sparse matrices:
+
+    val A = sparse(
+              (1, 3) :: Nil,
+              (0, 2) :: (1, 2.5) :: Nil
+                  )
+
+Diagonal matrix with constant diagonal elements:
+
+    diag(3.5, 10)
+
+Diagonal matrix with main diagonal backed by a vector:
+
+    diagv((1, 2, 3, 4, 5))
+    
+Identity matrix:
+
+    eye(10)
+    
+####Slicing and Assigning
+
+Getting a vector element:
+
+    val d = vec(5)
+
+Setting a vector element:
+    
+    vec(5) = 3.0
+    
+Getting a matrix element:
+
+    val d = m(3,5)
+    
+Setting a matrix element:
+
+    M(3,5) = 3.0
+    
+Getting a matrix row or column:
+
+    val rowVec = M(3, ::)
+    val colVec = M(::, 3)
+    
+Setting a matrix row or column via vector assignment:
+
+    M(3, ::) := (1, 2, 3)
+    M(::, 3) := (1, 2, 3)
+    
+Setting a subslices of a matrix row or column:
+
+    a(0, 0 to 1) = (3, 5)
+   
+Setting a subslices of a matrix row or column via vector assignment:
+
+    a(0, 0 to 1) := (3, 5)
+   
+Getting a matrix as from matrix contiguous block:
+
+    val B = A(2 to 3, 3 to 4)
+   
+Assigning a contiguous block to a matrix:
+
+    A(0 to 1, 1 to 2) = dense((3, 2), (3 ,3))
+   
+Assigning a contiguous block to a matrix using the matrix assignment operator:
+
+    A(o to 1, 1 to 2) := dense((3, 2), (3, 3))
+   
+Assignment operator used for copying between vectors or matrices:
+
+    vec1 := vec2
+    M1 := M2
+   
+Assignment operator using assignment through a functional literal for a matrix:
+
+    M := ((row, col, x) => if (row == col) 1 else 0
+    
+Assignment operator using assignment through a functional literal for a vector:
+
+    vec := ((index, x) => sqrt(x)
+    
+#### BLAS-like operations
+
+Plus/minus either vector or numeric with assignment or not:
+
+    a + b
+    a - b
+    a + 5.0
+    a - 5.0
+    
+Hadamard (elementwise) product, either vector or matrix or numeric operands:
+
+    a * b
+    a * 0.5
+
+Operations with assignment:
+
+    a += b
+    a -= b
+    a += 5.0
+    a -= 5.0
+    a *= b
+    a *= 5
+   
+*Some nuanced rules*: 
+
+1/x in R (where x is a vector or a matrix) is elementwise inverse.  In scala it would be expressed as:
+
+    val xInv = 1 /: x
+
+and R's 5.0 - x would be:
+   
+    val x1 = 5.0 -: x
+    
+*note: All assignment operations, including :=, return the assignee just like in C++*:
+
+    a -= b 
+    
+assigns **a - b** to **b** (in-place) and returns **b**.  Similarly for **a /=: b** or **1 /=: v** 
+    
+
+Dot product:
+
+    a dot b
+    
+Matrix and vector equivalency (or non-equivalency).  **Dangerous, exact equivalence is rarely useful, better to use norm comparisons with an allowance of small errors.**
+    
+    a === b
+    a !== b
+    
+Matrix multiply:    
+
+    a %*% b
+    
+Optimized Right Multiply with a diagonal matrix: 
+
+    diag(5, 5) :%*% b
+   
+Optimized Left Multiply with a diagonal matrix:
+
+    A %*%: diag(5, 5)
+
+Second norm, of a vector or matrix:
+
+    a.norm
+    
+Transpose:
+
+    val Mt = M.t
+    
+*note: Transposition is currently handled via view, i.e. updating a transposed matrix will be updating the original.*  Also computing something like `\(\mathbf{X^\top}\mathbf{X}\)`:
+
+    val XtX = X.t %*% X
+    
+will not therefore incur any additional data copying.
+
+#### Decompositions
+
+Matrix decompositions require an additional import:
+
+    import org.apache.mahout.math.decompositions._
+
+
+All arguments in the following are matricies.
+
+**Cholesky decomposition**
+
+    val ch = chol(M)
+    
+**SVD**
+
+    val (U, V, s) = svd(M)
+    
+**EigenDecomposition**
+
+    val (V, d) = eigen(M)
+    
+**QR decomposition**
+
+    val (Q, R) = qr(M)
+    
+**Rank**: Check for rank deficiency (runs rank-revealing QR)
+
+    M.isFullRank
+   
+**In-core SSVD**
+
+    Val (U, V, s) = ssvd(A, k = 50, p = 15, q = 1)
+    
+**Solving linear equation systems and matrix inversion:** fully similar to R semantics; there are three forms of invocation:
+
+
+Solve `\(\mathbf{AX}=\mathbf{B}\)`:
+
+    solve(A, B)
+   
+Solve `\(\mathbf{Ax}=\mathbf{b}\)`:
+  
+    solve(A, b)
+   
+Compute `\(\mathbf{A^{-1}}\)`:
+
+    solve(A)
+   
+#### Misc
+
+Vector cardinality:
+
+    a.length
+    
+Matrix cardinality:
+
+    m.nrow
+    m.ncol
+    
+Means and sums:
+
+    m.colSums
+    m.colMeans
+    m.rowSums
+    m.rowMeans
+    
+Copy-By-Value:
+
+    val b = a cloned
+    
+#### Random Matrices
+
+`\(\mathcal{U}\)`(0,1) random matrix view:
+
+    val incCoreA = Matrices.uniformView(m, n, seed)
+
+    
+`\(\mathcal{U}\)`(-1,1) random matrix view:
+
+    val incCoreA = Matrices.symmetricUniformView(m, n, seed)
+
+`\(\mathcal{N}\)`(-1,1) random matrix view:
+
+    val incCoreA = Matrices.gaussianView(m, n, seed)
+    
+#### Iterators 
+
+Mahout-Math already exposes a number of iterators.  Scala code just needs the following imports to enable implicit conversions to scala iterators.
+
+    import collection._
+    import JavaConversions._
+    
+Iterating over rows in a Matrix:
+
+    for (row <- m) {
+      ... do something with row
+    }
+    
+<!--Iterating over non-zero and all elements of a vector:
+*Note that Vector.Element also has some implicit syntatic sugar, e.g to add 5.0 to every non-zero element of a matrix, the following code may be used:*
+
+    for (row <- m; el <- row.nonZero) el = 5.0 + el
+    ... or 
+    for (row <- m; el <- row.nonZero) el := 5.0 + el
+    
+Similarly **row.all** produces an iterator over all elements in a row (Vector). 
+-->
+
+For more information including information on Mahout-Samsara's out-of-core Linear algebra bindings see: [Mahout Scala Bindings and Mahout Spark Bindings for Linear Algebra Subroutines](http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf)
+
+

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/programming_guide/mahout-samsara/out-of-core-reference.md
----------------------------------------------------------------------
diff --git a/website/programming_guide/mahout-samsara/out-of-core-reference.md b/website/programming_guide/mahout-samsara/out-of-core-reference.md
new file mode 100644
index 0000000..b68e813
--- /dev/null
+++ b/website/programming_guide/mahout-samsara/out-of-core-reference.md
@@ -0,0 +1,317 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara Out of Core
+theme:
+    name: mahout2
+---
+# Mahout-Samsara's Distributed Linear Algebra DSL Reference
+
+**Note: this page is meant only as a quick reference to Mahout-Samsara's R-Like DSL semantics.  For more information, including information on Mahout-Samsara's Algebraic Optimizer please see: [Mahout Scala Bindings and Mahout Spark Bindings for Linear Algebra Subroutines](http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf).**
+
+The subjects of this reference are solely applicable to Mahout-Samsara's **DRM** (distributed row matrix).
+
+In this reference, DRMs will be denoted as e.g. `A`, and in-core matrices as e.g. `inCoreA`.
+
+#### Imports 
+
+The following imports are used to enable seamless in-core and distributed algebraic DSL operations:
+
+    import org.apache.mahout.math._
+    import scalabindings._
+    import RLikeOps._
+    import drm._
+    import RLikeDRMOps._
+    
+If working with mixed scala/java code:
+    
+    import collection._
+    import JavaConversions._
+    
+If you are working with Mahout-Samsara's Spark-specific operations e.g. for context creation:
+
+    import org.apache.mahout.sparkbindings._
+    
+The Mahout shell does all of these imports automatically.
+
+
+#### DRM Persistence operators
+
+**Mahout-Samsara's DRM persistance to HDFS is compatible with all Mahout-MapReduce algorithms such as seq2sparse.**
+
+
+Loading a DRM from (HD)FS:
+
+    drmDfsRead(path = hdfsPath)
+     
+Parallelizing from an in-core matrix:
+
+    val inCoreA = (dense(1, 2, 3), (3, 4, 5))
+    val A = drmParallelize(inCoreA)
+    
+Creating an empty DRM:
+
+    val A = drmParallelizeEmpty(100, 50)
+    
+Collecting to driver's jvm in-core:
+
+    val inCoreA = A.collect
+    
+**Warning: The collection of distributed matrices happens implicitly whenever conversion to an in-core (o.a.m.math.Matrix) type is required. E.g.:**
+
+    val inCoreA: Matrix = ...
+    val drmB: DrmLike[Int] =...
+    val inCoreC: Matrix = inCoreA %*%: drmB
+    
+**implies (incoreA %*%: drmB).collect**
+
+Collecting to (HD)FS as a Mahout's DRM formatted file:
+
+    A.dfsWrite(path = hdfsPath)
+    
+#### Logical algebraic operators on DRM matrices:
+
+A logical set of operators are defined for distributed matrices as a subset of those defined for in-core matrices.  In particular, since all distributed matrices are immutable, there are no assignment operators (e.g. **A += B**)
+*Note: please see: [Mahout Scala Bindings and Mahout Spark Bindings for Linear Algebra Subroutines](http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf) for information on Mahout-Samsars's Algebraic Optimizer, and translation from logical operations to a physical plan for the back end.*
+ 
+    
+Cache a DRM and trigger an optimized physical plan: 
+
+    drmA.checkpoint(CacheHint.MEMORY_AND_DISK)
+   
+Other valid caching Instructions:
+
+    drmA.checkpoint(CacheHint.NONE)
+    drmA.checkpoint(CacheHint.DISK_ONLY)
+    drmA.checkpoint(CacheHint.DISK_ONLY_2)
+    drmA.checkpoint(CacheHint.MEMORY_ONLY)
+    drmA.checkpoint(CacheHint.MEMORY_ONLY_2)
+    drmA.checkpoint(CacheHint.MEMORY_ONLY_SER
+    drmA.checkpoint(CacheHint.MEMORY_ONLY_SER_2)
+    drmA.checkpoint(CacheHint.MEMORY_AND_DISK_2)
+    drmA.checkpoint(CacheHint.MEMORY_AND_DISK_SER)
+    drmA.checkpoint(CacheHint.MEMORY_AND_DISK_SER_2)
+
+*Note: Logical DRM operations are lazily computed.  Currently the actual computations and optional caching will be triggered by dfsWrite(...), collect(...) and blockify(...).*
+
+
+
+Transposition:
+
+    A.t
+ 
+Elementwise addition *(Matrices of identical geometry and row key types)*:
+  
+    A + B
+
+Elementwise subtraction *(Matrices of identical geometry and row key types)*:
+
+    A - B
+    
+Elementwise multiplication (Hadamard) *(Matrices of identical geometry and row key types)*:
+
+    A * B
+    
+Elementwise division *(Matrices of identical geometry and row key types)*:
+
+    A / B
+    
+**Elementwise operations involving one in-core argument (int-keyed DRMs only)**:
+
+    A + inCoreB
+    A - inCoreB
+    A * inCoreB
+    A / inCoreB
+    A :+ inCoreB
+    A :- inCoreB
+    A :* inCoreB
+    A :/ inCoreB
+    inCoreA +: B
+    inCoreA -: B
+    inCoreA *: B
+    inCoreA /: B
+
+Note the Spark associativity change (e.g. `A *: inCoreB` means `B.leftMultiply(A`), same as when both arguments are in core). Whenever operator arguments include both in-core and out-of-core arguments, the operator can only be associated with the out-of-core (DRM) argument to support the distributed implementation.
+    
+**Matrix-matrix multiplication %*%**:
+
+`\(\mathbf{M}=\mathbf{AB}\)`
+
+    A %*% B
+    A %*% inCoreB
+    A %*% inCoreDiagonal
+    A %*%: B
+
+
+*Note: same as above, whenever operator arguments include both in-core and out-of-core arguments, the operator can only be associated with the out-of-core (DRM) argument to support the distributed implementation.*
+ 
+**Matrix-vector multiplication %*%**
+Currently we support a right multiply product of a DRM and an in-core Vector(`\(\mathbf{Ax}\)`) resulting in a single column DRM, which then can be collected in front (usually the desired outcome):
+
+    val Ax = A %*% x
+    val inCoreX = Ax.collect(::, 0)
+    
+
+**Matrix-scalar +,-,*,/**
+Elementwise operations of every matrix element and a scalar:
+
+    A + 5.0
+    A - 5.0
+    A :- 5.0
+    5.0 -: A
+    A * 5.0
+    A / 5.0
+    5.0 /: a
+    
+Note that `5.0 -: A` means `\(m_{ij} = 5 - a_{ij}\)` and `5.0 /: A` means `\(m_{ij} = \frac{5}{a{ij}}\)` for all elements of the result.
+    
+    
+#### Slicing
+
+General slice:
+
+    A(100 to 200, 100 to 200)
+    
+Horizontal Block:
+
+    A(::, 100 to 200)
+    
+Vertical Block:
+
+    A(100 to 200, ::)
+    
+*Note: if row range is not all-range (::) the the DRM must be `Int`-keyed.  General case row slicing is not supported by DRMs with key types other than `Int`*.
+
+
+#### Stitching
+
+Stitch side by side (cbind R semantics):
+
+    val drmAnextToB = drmA cbind drmB
+    
+Stitch side by side (Scala):
+
+    val drmAnextToB = drmA.cbind(drmB)
+    
+Analogously, vertical concatenation is available via **rbind**
+
+#### Custom pipelines on blocks
+Internally, Mahout-Samsara's DRM is represented as a distributed set of vertical (Key, Block) tuples.
+
+**drm.mapBlock(...)**:
+
+The DRM operator `mapBlock` provides transformational access to the distributed vertical blockified tuples of a matrix (Row-Keys, Vertical-Matrix-Block).
+
+Using `mapBlock` to add 1.0 to a DRM:
+
+    val inCoreA = dense((1, 2, 3), (2, 3 , 4), (3, 4, 5))
+    val drmA = drmParallelize(inCoreA)
+    val B = A.mapBlock() {
+        case (keys, block) => keys -> (block += 1.0)
+    }
+    
+#### Broadcasting Vectors and matrices to closures
+Generally we can create and use one-way closure attributes to be used on the back end.
+
+Scalar matrix multiplication:
+
+    val factor: Int = 15
+    val drm2 = drm1.mapBlock() {
+        case (keys, block) => block *= factor
+        keys -> block
+    }
+
+**Closure attributes must be java-serializable. Currently Mahout's in-core Vectors and Matrices are not java-serializable, and must be broadcast to the closure using `drmBroadcast(...)`**:
+
+    val v: Vector ...
+    val bcastV = drmBroadcast(v)
+    val drm2 = drm1.mapBlock() {
+        case (keys, block) =>
+            for(row <- 0 until block.nrow) block(row, ::) -= bcastV
+        keys -> block    
+    }
+
+#### Computations providing ad-hoc summaries
+
+
+Matrix cardinality:
+
+    drmA.nrow
+    drmA.ncol
+
+*Note: depending on the stage of optimization, these may trigger a computational action.  I.e. if one calls `nrow()` n times, then the back end will actually recompute `nrow` n times.*
+    
+Means and sums:
+
+    drmA.colSums
+    drmA.colMeans
+    drmA.rowSums
+    drmA.rowMeans
+    
+ 
+*Note: These will always trigger a computational action.  I.e. if one calls `colSums()` n times, then the back end will actually recompute `colSums` n times.*
+
+#### Distributed Matrix Decompositions
+
+To import the decomposition package:
+    
+    import org.apache.mahout.math._
+    import decompositions._
+    
+Distributed thin QR:
+
+    val (drmQ, incoreR) = dqrThin(drmA)
+    
+Distributed SSVD:
+ 
+    val (drmU, drmV, s) = dssvd(drmA, k = 40, q = 1)
+    
+Distributed SPCA:
+
+    val (drmU, drmV, s) = dspca(drmA, k = 30, q = 1)
+
+Distributed regularized ALS:
+
+    val (drmU, drmV, i) = dals(drmA,
+                            k = 50,
+                            lambda = 0.0,
+                            maxIterations = 10,
+                            convergenceThreshold = 0.10))
+                            
+#### Adjusting parallelism of computations
+
+Set the minimum parallelism to 100 for computations on `drmA`:
+
+    drmA.par(min = 100)
+ 
+Set the exact parallelism to 100 for computations on `drmA`:
+
+    drmA.par(exact = 100)
+
+
+Set the engine specific automatic parallelism adjustment for computations on `drmA`:
+
+    drmA.par(auto = true)
+
+#### Retrieving the engine specific data structure backing the DRM:
+
+**A Spark RDD:**
+
+    val myRDD = drmA.checkpoint().rdd
+    
+**An H2O Frame and Key Vec:**
+
+    val myFrame = drmA.frame
+    val myKeys = drmA.keys
+    
+**A Flink DataSet:**
+
+    val myDataSet = drmA.ds
+    
+For more information including information on Mahout-Samsara's Algebraic Optimizer and in-core Linear algebra bindings see: [Mahout Scala Bindings and Mahout Spark Bindings for Linear Algebra Subroutines](http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf)
+
+
+
+    
+
+
+

http://git-wip-us.apache.org/repos/asf/mahout/blob/660036eb/website/programming_guide/quickstart.md
----------------------------------------------------------------------
diff --git a/website/programming_guide/quickstart.md b/website/programming_guide/quickstart.md
new file mode 100644
index 0000000..5bd4a5f
--- /dev/null
+++ b/website/programming_guide/quickstart.md
@@ -0,0 +1,63 @@
+---
+layout: mahoutdoc
+title: Quickstart
+theme: 
+    name: mahout2
+---
+# Mahout Quick Start 
+# TODO : Fill this in with the bare essential basics
+
+
+
+# Mahout MapReduce Overview
+
+## Getting Mahout
+
+#### Download the latest release
+
+Download the latest release [here](http://www.apache.org/dyn/closer.cgi/mahout/).
+
+Or checkout the latest code from [here](http://mahout.apache.org/developers/version-control.html)
+
+#### Alternatively: Add Mahout 0.13.0 to a maven project
+
+Mahout is also available via a [maven repository](http://mvnrepository.com/artifact/org.apache.mahout) under the group id *org.apache.mahout*.
+If you would like to import the latest release of mahout into a java project, add the following dependency in your *pom.xml*:
+
+    <dependency>
+        <groupId>org.apache.mahout</groupId>
+        <artifactId>mahout-mr</artifactId>
+        <version>0.13.0</version>
+    </dependency>
+ 
+
+## Features
+
+For a full list of Mahout's features see our [Features by Engine](http://mahout.apache.org/users/basics/algorithms.html) page.
+
+    
+## Using Mahout
+
+Mahout has prepared a bunch of examples and tutorials for users to quickly learn how to use its machine learning algorithms.
+
+#### Recommendations
+
+Check the [Recommender Quickstart](/users/recommender/quickstart.html) or the tutorial on [creating a userbased recommender in 5 minutes](/users/recommender/userbased-5-minutes.html).
+
+If you are building a recommender system for the first time, please also refer to a list of [Dos and Don'ts](/users/recommender/recommender-first-timer-faq.html) that might be helpful.
+
+#### Clustering
+
+Check the [Synthetic data](/users/clustering/clustering-of-synthetic-control-data.html) example.
+
+#### Classification
+
+If you are interested in how to train a **Naive Bayes** model, look at the [20 newsgroups](/users/classification/twenty-newsgroups.html) example.
+
+If you plan to build a **Hidden Markov Model** for speech recognition, the example [here](/users/classification/hidden-markov-models.html) might be instructive. 
+
+Or you could build a **Random Forest** model by following this [quick start page](/users/classification/partial-implementation.html).
+
+#### Working with Text 
+
+If you need to convert raw text into word vectors as input to clustering or classification algorithms, please refer to this page on [how to create vectors from text](/users/basics/creating-vectors-from-text.html).


Mime
View raw message