# mahout-commits mailing list archives

##### Site index · List index
Message view
Top
From dlyubi...@apache.org
Subject svn commit: r1596082 - /mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext
Date Mon, 19 May 2014 22:20:30 GMT
Author: dlyubimov
Date: Mon May 19 22:20:30 2014
New Revision: 1596082

URL: http://svn.apache.org/r1596082
Log:
CMS commit to mahout by dlyubimov

Modified:
mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext?rev=1596082&r1=1596081&r2=1596082&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext Mon May
19 22:20:30 2014
@@ -65,7 +65,16 @@ val drmData = drmParallelize(dense(
numPartitions = 2);
</pre></div>

-Have a look at this matrix. The first four columns represent the ingredients (our features)
and the last column (the rating) is the target variable for our regression. [Linear regression](https://en.wikipedia.org/wiki/Linear_regression)
assumes that the **target variable y** is generated by the linear combination of **the feature
matrix X** with the **parameter vector Î²** plus the **noise Îµ**, summarized in the formula
$$\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}$$. Our goal is to find
an estimate of the parameter vector $$\boldsymbol{\beta}$$ that explains the data very well.
+Have a look at this matrix. The first four columns represent the ingredients
+(our features) and the last column (the rating) is the target variable for
+our regression. [Linear regression](https://en.wikipedia.org/wiki/Linear_regression)
+assumes that the **target variable** $$\mathbf{y}$$ is generated by the
+linear combination of **the feature matrix** $$\mathbf{X}$$ with the
+**parameter vector** $$\boldsymbol{\beta}$$ plus the
+ **noise** $$\boldsymbol{\varepsilon}$$, summarized in the formula
+$$\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}$$.
+Our goal is to find an estimate of the parameter vector
+$$\boldsymbol{\beta}$$ that explains the data very well.

As a first step, we extract $$\mathbf{X}$$ and $$\mathbf{y}$$ from our data matrix. We
get *X* by slicing: we take all rows (denoted by ::) and the first four columns, which
have the ingredients in milligrams as content. Note that the result is again a DRM. The shell
will not execute this code yet, it saves the history of operations and defers the execution
until we really access a result. **Mahout's DSL automatically optimizes and parallelizes all
operations on DRMs and runs them on Apache Spark.**


Mime
View raw message