# mahout-commits mailing list archives

##### Site index · List index
Message view
Top
From dlyubi...@apache.org
Subject svn commit: r1685198 - /mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext
Date Sat, 13 Jun 2015 00:38:17 GMT
Author: dlyubimov
Date: Sat Jun 13 00:38:16 2015
New Revision: 1685198

URL: http://svn.apache.org/r1685198
Log:
CMS commit to mahout by dlyubimov

Modified:
mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext?rev=1685198&r1=1685197&r2=1685198&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext Sat Jun
13 00:38:16 2015
@@ -2,6 +2,8 @@

This tutorial will show you how to play with Mahout's scala DSL for linear algebra and its
Spark shell. **Please keep in mind that this code is still in a very early experimental stage**.

+_(Edited for 0.10.2)_
+
## Intro

We'll use an excerpt of a publicly available [dataset about cereals](http://lib.stat.cmu.edu/DASL/Datafiles/Cereals.html).
The dataset tells the protein, fat, carbohydrate and sugars (in milligrams) contained in a
set of cereals, as well as a customer rating for the cereals. Our aim for this example is
to fit a linear model which infers the customer rating from the ingredients.
@@ -161,25 +163,12 @@ right angle. An easy way to add such a b
column of ones to the feature matrix $$\mathbf{X}$$.
The corresponding weight in the parameter vector will then be the bias term.

-Mahout's DSL offers a mapBlock() method for custom modifications of a DRM. All the
rows in a partition are merged to a block of the matrix which is given to custom code in a
closure. For our example, we invoke mapBlock with ncol = drmX.ncol + 1 to let
the system know that change the number of columns of the matrix. The input to our closure
is a block of the DRM and an array of keys for the rows contained in the block.
In order to add a column, we first create a new block with an additional column, then copy
the data from the current block into the new block and finally set the last column to ones
and return the new block.
+Here is how we add a bias column:

<div class="codehilite"><pre>
-val drmXwithBiasColumn = drmX.mapBlock(ncol = drmX.ncol + 1) {
-  case(keys, block) =>
-    // create a new block with an additional column
-    val blockWithBiasColumn = block.like(block.nrow, block.ncol + 1)
-    // copy data from current block into the new block
-    blockWithBiasColumn(::, 0 until block.ncol) := block
-    // last column consists of ones
-    blockWithBiasColumn(::, block.ncol) := 1
-
-    keys -> blockWithBiasColumn
-}
+val drmXwithBiasColumn = drmX cbind 1
</pre></div>

-(This looks like a lot of work for something that would be simply cbind(drmX, 1) in R.
Matrix-scalar
-cbind combination is still a TODO in Mahout's dialect, although cbind exists for other
operand type combinations.)
-
Now we can give the newly created DRM drmXwithBiasColumn to our model fitting method
ols and see how well the resulting model fits the training data with goodnessOfFit.
You should see a large improvement in the result.

<div class="codehilite"><pre>


Mime
View raw message