Return-Path: X-Original-To: apmail-mahout-commits-archive@www.apache.org Delivered-To: apmail-mahout-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 756BD11423 for ; Mon, 19 May 2014 22:20:54 +0000 (UTC) Received: (qmail 96610 invoked by uid 500); 19 May 2014 22:20:54 -0000 Delivered-To: apmail-mahout-commits-archive@mahout.apache.org Received: (qmail 96551 invoked by uid 500); 19 May 2014 22:20:54 -0000 Mailing-List: contact commits-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list commits@mahout.apache.org Received: (qmail 96544 invoked by uid 99); 19 May 2014 22:20:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 22:20:54 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 22:20:54 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id DBF412388980; Mon, 19 May 2014 22:20:30 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1596082 - /mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext Date: Mon, 19 May 2014 22:20:30 -0000 To: commits@mahout.apache.org From: dlyubimov@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20140519222030.DBF412388980@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: dlyubimov Date: Mon May 19 22:20:30 2014 New Revision: 1596082 URL: http://svn.apache.org/r1596082 Log: CMS commit to mahout by dlyubimov Modified: mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext Modified: mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext?rev=1596082&r1=1596081&r2=1596082&view=diff ============================================================================== --- mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext (original) +++ mahout/site/mahout_cms/trunk/content/users/sparkbindings/play-with-shell.mdtext Mon May 19 22:20:30 2014 @@ -65,7 +65,16 @@ val drmData = drmParallelize(dense( numPartitions = 2); -Have a look at this matrix. The first four columns represent the ingredients (our features) and the last column (the rating) is the target variable for our regression. [Linear regression](https://en.wikipedia.org/wiki/Linear_regression) assumes that the **target variable y** is generated by the linear combination of **the feature matrix X** with the **parameter vector β** plus the **noise ε**, summarized in the formula `\(\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\)`. Our goal is to find an estimate of the parameter vector `\(\boldsymbol{\beta}\)` that explains the data very well. +Have a look at this matrix. The first four columns represent the ingredients +(our features) and the last column (the rating) is the target variable for +our regression. [Linear regression](https://en.wikipedia.org/wiki/Linear_regression) +assumes that the **target variable** `\(\mathbf{y}\)` is generated by the +linear combination of **the feature matrix** `\(\mathbf{X}\)` with the +**parameter vector** `\(\boldsymbol{\beta}\)` plus the + **noise** `\(\boldsymbol{\varepsilon}\)`, summarized in the formula +`\(\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\)`. +Our goal is to find an estimate of the parameter vector +`\(\boldsymbol{\beta}\)` that explains the data very well. As a first step, we extract `\(\mathbf{X}\)` and `\(\mathbf{y}\)` from our data matrix. We get *X* by slicing: we take all rows (denoted by ```::```) and the first four columns, which have the ingredients in milligrams as content. Note that the result is again a DRM. The shell will not execute this code yet, it saves the history of operations and defers the execution until we really access a result. **Mahout's DSL automatically optimizes and parallelizes all operations on DRMs and runs them on Apache Spark.**