mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dlyubi...@apache.org
Subject svn commit: r1596082 - /mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext
Date Mon, 19 May 2014 22:20:30 GMT
Author: dlyubimov
Date: Mon May 19 22:20:30 2014
New Revision: 1596082

URL: http://svn.apache.org/r1596082
Log:
CMS commit to mahout by dlyubimov

Modified:
    mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext?rev=1596082&r1=1596081&r2=1596082&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext Mon May
19 22:20:30 2014
@@ -65,7 +65,16 @@ val drmData = drmParallelize(dense(
   numPartitions = 2);
 </pre></div>
 
-Have a look at this matrix. The first four columns represent the ingredients (our features)
and the last column (the rating) is the target variable for our regression. [Linear regression](https://en.wikipedia.org/wiki/Linear_regression)
assumes that the **target variable y** is generated by the linear combination of **the feature
matrix X** with the **parameter vector β** plus the **noise ε**, summarized in the formula
`\(\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\)`. Our goal is to find
an estimate of the parameter vector `\(\boldsymbol{\beta}\)` that explains the data very well.
+Have a look at this matrix. The first four columns represent the ingredients 
+(our features) and the last column (the rating) is the target variable for 
+our regression. [Linear regression](https://en.wikipedia.org/wiki/Linear_regression) 
+assumes that the **target variable** `\(\mathbf{y}\)` is generated by the 
+linear combination of **the feature matrix** `\(\mathbf{X}\)` with the 
+**parameter vector** `\(\boldsymbol{\beta}\)` plus the
+ **noise** `\(\boldsymbol{\varepsilon}\)`, summarized in the formula 
+`\(\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\)`. 
+Our goal is to find an estimate of the parameter vector 
+`\(\boldsymbol{\beta}\)` that explains the data very well.
 
 As a first step, we extract `\(\mathbf{X}\)` and `\(\mathbf{y}\)` from our data matrix. We
get *X* by slicing: we take all rows (denoted by ```::```) and the first four columns, which
have the ingredients in milligrams as content. Note that the result is again a DRM. The shell
will not execute this code yet, it saves the history of operations and defers the execution
until we really access a result. **Mahout's DSL automatically optimizes and parallelizes all
operations on DRMs and runs them on Apache Spark.**
 



Mime
View raw message