mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy Lyubimov (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-1500) H2O integration
Date Tue, 01 Apr 2014 18:53:17 GMT


Dmitriy Lyubimov commented on MAHOUT-1500:

bq. What is the "Algebraic DSL"? Is that the one which came with the scala bindings (with
"%*%" operator etc.)?

There are two sets of operators -- for mahout-math (in-core), i call it scala bindings and
it is in the math-scala. It doesn't do much actually but just providing a syntactic sugar
for passing off things to in-core cost-based optimizers (where they are implemented). 

The second set of DSL is for (looking identically to in-core set of operators)  is for distributed
stuff. (on diagram those two are not visually separated other than there's just part of it
over in-core and part of it over distributed optimizer).

bq. Today, what distinguishes "Logical translation layer" vs "Physical translation layer"
in the code? What parts of the code is considered to be the "Logical translation layer"? 

Well you need to keep in perspective that distributed optimizer part was done in like 3 days
and it is now fairly tightly bound to spark code so separation at this point is not very clean
until we introduce another engine (which is coming). Obviously at the time of introducing
second engine, this needs to be abstracted in a separate module without spark dependencies.

Logical translation is everything in drm.plan (operators implementing DrmLike[] ). 
Physical translation to Spark is CheckpointedDrm, CheckpointAction and everything in blas
package (actual spark specific support for physical plan after optimization run). 

bq. Is the selection of "physical translation layer" a run-time decision?
yes it is run time optimizer action based on operand types, geometry (size), orientation and
partitioning. (very similar in fact to what happens in Pig graph, except such graph rewrites
are much more elegant in Scala).

> H2O integration
> ---------------
>                 Key: MAHOUT-1500
>                 URL:
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Anand Avati
>             Fix For: 1.0
> Integration with h2o ( in order to exploit its high performance
computational abilities.
> Start with providing implementations of AbstractMatrix and AbstractVector, and more as
we make progress.

This message was sent by Atlassian JIRA

View raw message