mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r946256 - in /websites/staging/mahout/trunk/content: ./ users/environment/h2o-internals.html
Date Fri, 03 Apr 2015 23:14:51 GMT
Author: buildbot
Date: Fri Apr  3 23:14:51 2015
New Revision: 946256

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/environment/h2o-internals.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Apr  3 23:14:51 2015
@@ -1 +1 @@
-1671214
+1671216

Modified: websites/staging/mahout/trunk/content/users/environment/h2o-internals.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/environment/h2o-internals.html (original)
+++ websites/staging/mahout/trunk/content/users/environment/h2o-internals.html Fri Apr  3
23:14:51 2015
@@ -256,7 +256,7 @@
    <div id="main">
     <h1 id="introduction">Introduction</h1>
 <p>This document provides an overview of how the Mahout Scala DSL (distributed algebraic
operators) is implemented over the H2O backend engine. The document is aimed at Mahout developers,
to give a high level description of the design so that one can explore the code inside <code>h2o/</code>
with some context.</p>
-<h2 id="h2o-overview"><a href="http://h2o.ai/">H2O</a> Overview</h2>
+<h2 id="h2o-overview">H2O Overview</h2>
 <p>H2O is a distributed scalable machine learning system. Internal architecture of
H2O has a distributed math engine (h2o-core) and a separate layer on top for algorithms and
UI. The Mahout integration requires only the math engine (h2o-core).</p>
 <h2 id="h2o-data-model">H2O Data Model</h2>
 <p>The data model of the H2O math engine is a distributed columnar store (of primarily
numbers, but also strings). A column of numbers is called a Vector, which is broken into Chunks
(of a few thousand elements). Chunks are distributed across the cluster based on a deterministic
hash. Therefore, any member of the cluster knows where a particular Chunk of a Vector is homed.
Each Chunk is separately compressed in memory and elements are individually decompressed on
the fly upon access with purely register operations (thereby achieving high memory throughput).
An ordered set of similarly partitioned Vecs are composed into a Frame. A Frame is therefore
a large two dimensional table of numbers. All elements of a logical row in the Frame are guaranteed
to be homed in the same server of the cluster. Generally speaking, H2O works well on "tall
skinny" data, i.e, lots of rows (100s of millions) and modest number of columns (10s of thousands).</p>
@@ -267,22 +267,14 @@
 <p>H2O provides a flexible execution framework called <code>MRTask</code>.
The <code>MRTask</code> framework typically executes over a Frame (or even a Vector),
supports various types of map() methods, can optionally modify the Frame or Vector (though
this never happens in the Mahout integration), and optionally create a new Vector or set of
Vectors (to combine them into a new Frame, and consequently a new DRM).</p>
 <h2 id="source-layout">Source Layout</h2>
 <p>Within mahout.git, the top level directory, <code>h2o/</code> holds
all the source code related to the H2O backend engine. Part of the code (that interfaces with
the rest of the Mahout componenets) is in Scala, and part of the code (that interfaces with
h2o-core and implements algebraic operators) is in Java. Here is a brief overview of what
functionality can be found where within <code>h2o/</code>.</p>
-<div class="codehilite"><pre><span class="n">h2o</span><span class="o">/</span>
<span class="o">-</span> <span class="n">top</span> <span class="n">level</span>
<span class="n">directory</span> <span class="n">containing</span>
<span class="n">all</span> <span class="n">H2O</span> <span class="n">related</span>
<span class="n">code</span>
-
-<span class="n">h2o</span><span class="o">/</span><span class="n">src</span><span
class="o">/</span><span class="n">main</span><span class="o">/</span><span
class="n">java</span><span class="o">/</span><span class="n">org</span><span
class="o">/</span><span class="n">apache</span><span class="o">/</span><span
class="n">mahout</span><span class="o">/</span><span class="n">h2obindings</span><span
class="o">/</span><span class="n">ops</span><span class="o">/*</span><span
class="p">.</span><span class="n">java</span> <span class="o">-</span>
<span class="n">Physical</span> <span class="n">operator</span> <span
class="n">code</span> <span class="k">for</span> <span class="n">the</span>
<span class="n">various</span> <span class="n">DSL</span> <span
class="n">algebra</span>
-
-<span class="n">h2o</span><span class="o">/</span><span class="n">src</span><span
class="o">/</span><span class="n">main</span><span class="o">/</span><span
class="n">java</span><span class="o">/</span><span class="n">org</span><span
class="o">/</span><span class="n">apache</span><span class="o">/</span><span
class="n">mahout</span><span class="o">/</span><span class="n">h2obindings</span><span
class="o">/</span><span class="n">drm</span><span class="o">/*</span><span
class="p">.</span><span class="n">java</span> <span class="o">-</span>
<span class="n">DRM</span> <span class="n">backing</span> <span
class="p">(</span><span class="n">onto</span> <span class="n">Frame</span><span
class="p">)</span> <span class="n">and</span> <span class="n">Broadcast</span>
<span class="n">implementation</span>
-
-<span class="n">h2o</span><span class="o">/</span><span class="n">src</span><span
class="o">/</span><span class="n">main</span><span class="o">/</span><span
class="n">java</span><span class="o">/</span><span class="n">org</span><span
class="o">/</span><span class="n">apache</span><span class="o">/</span><span
class="n">mahout</span><span class="o">/</span><span class="n">h2obindings</span><span
class="o">/</span><span class="n">H2OHdfs</span><span class="p">.</span><span
class="n">java</span> <span class="o">-</span> <span class="n">Read</span>
<span class="o">/</span> <span class="n">Write</span> <span class="n">between</span>
<span class="n">DRM</span> <span class="p">(</span><span class="n">Frame</span><span
class="p">)</span> <span class="n">and</span> <span class="n">files</span>
<span class="n">on</span> <span class="n">HDFS</span>
-
-<span class="n">h2o</span><span class="o">/</span><span class="n">src</span><span
class="o">/</span><span class="n">main</span><span class="o">/</span><span
class="n">java</span><span class="o">/</span><span class="n">org</span><span
class="o">/</span><span class="n">apache</span><span class="o">/</span><span
class="n">mahout</span><span class="o">/</span><span class="n">h2obindings</span><span
class="o">/</span><span class="n">H2OBlockMatrix</span><span class="p">.</span><span
class="n">java</span> <span class="o">-</span> <span class="n">A</span>
<span class="n">vertical</span> <span class="n">block</span> <span
class="n">matrix</span> <span class="n">of</span> <span class="n">DRM</span>
<span class="n">presented</span> <span class="n">as</span> <span
class="n">a</span> <span class="n">virtual</span> <span class="n">copy</span><span
class="o">-</span><span class="n">on</span><span class="o">-</span><span
class="n">write</span> <span class="n">in</span><span class="o">-</span><span
  class="n">core</span> <span class="n">Matrix</span><span class="p">.</span>
<span class="n">Used</span> <span class="n">in</span> <span class="n">mapBlock</span><span
class="p">()</span> <span class="n">API</span>
-
-<span class="n">h2o</span><span class="o">/</span><span class="n">src</span><span
class="o">/</span><span class="n">main</span><span class="o">/</span><span
class="n">java</span><span class="o">/</span><span class="n">org</span><span
class="o">/</span><span class="n">apache</span><span class="o">/</span><span
class="n">mahout</span><span class="o">/</span><span class="n">h2obindings</span><span
class="o">/</span><span class="n">H2OHelper</span><span class="p">.</span><span
class="n">java</span> <span class="o">-</span> <span class="n">A</span>
<span class="n">collection</span> <span class="n">of</span> <span
class="n">various</span> <span class="n">functionality</span> <span
class="n">and</span> <span class="n">helpers</span><span class="p">.</span>
<span class="n">For</span> <span class="n">e</span><span class="p">.</span><span
class="n">g</span><span class="p">,</span> <span class="n">convert</span>
<span class="n">between</span> <span class="n">in</span><span class="o">-</span><s
 pan class="n">core</span> <span class="n">Matrix</span> <span class="n">and</span>
<span class="n">DRM</span><span class="p">,</span> <span class="n">various</span>
<span class="n">summary</span> <span class="n">statistics</span> <span
class="n">on</span> <span class="n">DRM</span><span class="o">/</span><span
class="n">Frame</span><span class="p">.</span>
-
-<span class="n">h2o</span><span class="o">/</span><span class="n">src</span><span
class="o">/</span><span class="n">main</span><span class="o">/</span><span
class="n">scala</span><span class="o">/</span><span class="n">org</span><span
class="o">/</span><span class="n">apache</span><span class="o">/</span><span
class="n">mahout</span><span class="o">/</span><span class="n">h2obindings</span><span
class="o">/</span><span class="n">H2OEngine</span><span class="p">.</span><span
class="n">scala</span> <span class="o">-</span> <span class="n">DSL</span>
<span class="n">operator</span> <span class="n">graph</span> <span
class="n">evaluator</span> <span class="n">and</span> <span class="n">various</span>
<span class="n">abstract</span> <span class="n">API</span> <span
class="n">implementations</span> <span class="k">for</span> <span
class="n">a</span> <span class="n">distributed</span> <span class="n">engine</span>
-
-<span class="n">h2o</span><span class="o">/</span><span class="n">src</span><span
class="o">/</span><span class="n">main</span><span class="o">/</span><span
class="n">scala</span><span class="o">/</span><span class="n">org</span><span
class="o">/</span><span class="n">apache</span><span class="o">/</span><span
class="n">mahout</span><span class="o">/</span><span class="n">h2obindings</span><span
class="o">/*</span> <span class="o">-</span> <span class="n">Various</span>
<span class="n">abstract</span> <span class="n">API</span> <span
class="n">implementations</span> <span class="p">(</span>&quot;<span
class="n">glue</span> <span class="n">work</span>&quot;<span class="p">)</span>
-</pre></div>
+<p>h2o/ - top level directory containing all H2O related code</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/ops/*.java - Physical operator code
for the various DSL algebra</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/drm/*.java - DRM backing (onto Frame)
and Broadcast implementation</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/H2OHdfs.java - Read / Write between
DRM (Frame) and files on HDFS</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/H2OBlockMatrix.java - A vertical
block matrix of DRM presented as a virtual copy-on-write in-core Matrix. Used in mapBlock()
API</p>
+<p>h2o/src/main/java/org/apache/mahout/h2obindings/H2OHelper.java - A collection of
various functionality and helpers. For e.g, convert between in-core Matrix and DRM, various
summary statistics on DRM/Frame.</p>
+<p>h2o/src/main/scala/org/apache/mahout/h2obindings/H2OEngine.scala - DSL operator
graph evaluator and various abstract API implementations for a distributed engine</p>
+<p>h2o/src/main/scala/org/apache/mahout/h2obindings/* - Various abstract API implementations
("glue work")</p>
    </div>
   </div>     
 </div> 



Mime
View raw message