mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r902293 - in /websites/staging/mahout/trunk/content: ./ users/sparkbindings/home.html
Date Wed, 19 Mar 2014 06:03:06 GMT
Author: buildbot
Date: Wed Mar 19 06:03:06 2014
New Revision: 902293

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/sparkbindings/home.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Wed Mar 19 06:03:06 2014
@@ -1 +1 @@
-1579145
+1579146

Modified: websites/staging/mahout/trunk/content/users/sparkbindings/home.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/sparkbindings/home.html (original)
+++ websites/staging/mahout/trunk/content/users/sparkbindings/home.html Wed Mar 19 06:03:06
2014
@@ -218,17 +218,20 @@ in-core vectors and real scalars.</p>
 <p>The ecosystem of operators is built in the R's image, i.e. it follows R naming such
as %*%, 
 colSums, nrow, length operating over vectors or matices. </p>
 <p>Important part of Spark Bindings is expression optimizer. It looks at expression
as a whole 
-and figures out how it can be simplified, and which physical operators should be picked.
E.g.
+and figures out how it can be simplified, and which physical operators should be picked.
For example,
 there are currently about 5 different physical operators performing DRM-DRM multiplication
-picked based on matrix geometry, partitioning, orientation etc. If we count DRM by in-core

-combinations, that would be at least another 3. </p>
-<p>The main idea is that a scientist writing algebraic expressions can't care less
of distributed 
+picked based on matrix geometry, distributed dataset partitioning, orientation etc. 
+If we count in DRM by in-core combinations, that would be at least another 3 -- all of it
for just 
+simple A %*% B type of expression. </p>
+<p>The main idea is that a scientist writing algebraic expressions cannot care less
of distributed 
 operation plans and works entirely on the logical level just like he or she would do with
R.</p>
 <p>Another point of logical level manipulations is decoupling computation from distributed
back-end. 
 That is, the algebraic optimizer also acts as a translation layer to a concrete machine cluster
computational back-end. 
 Although it is not currently on roadmap (and not even 100% decoupled on the API level), 
 one can think of bringing in other back-ends and have the same algorithms running on those
without 
 a change.</p>
+<p>Please refer to the documentation for details.</p>
+<h2 id="status">Status</h2>
 <p>At this point, this environment addresses Linear Algebra side of things only. 
 However, it would be very exciting to include statistics and data frame support too. </p>
 <p>Also, this is early stage and experimental at this point. 
@@ -236,7 +239,6 @@ Tweaks may be needed here and there for 
 But being run on Spark, and assuming there's enough RAM not to swap intermediate products

 of computation to disk, it should be as fast as it can be expected of in-memory matrix 
 block manipulations in Mahout and IO associated with the use of Spark distributed primitives.</p>
-<p>Please refer to the documentation for details.</p>
 <h2 id="documentation">Documentation</h2>
 <ul>
 <li><a href="ScalaSparkBindings.pdf">Scala and Spark bindings manual</a></li>



Mime
View raw message