mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <>
Subject Re: Straw poll re: H2O ?
Date Tue, 29 Apr 2014 19:41:04 GMT
For reasons of transparency in this discussion, I should add that I am a 
committer on the upcoming Stratosphere ASF podling, co-worker of the 
main developers and have contributed to it as part of my PhD.

On 04/29/2014 09:23 PM, Sebastian Schelter wrote:
> Anand,
> I'm trying to answer some of your questions, and my answers highlight
> the points that I would like to see clarified about h20.
> On 04/28/2014 11:13 PM, Anand Avati wrote:
>> 1. Why is the DSL claiming to have (in its vision) logical vs physical
>> separation if not for providing multiple compute backends?
> This is not a claim or a vision, the DSL already has this separation.
> Take for example o.a.m.sparkbindings.drm.plan.OpAtA, thats the logical
> operator for executing a Transpose-Times-Self matrix multiplication. In
> o.a.m.sparkbindings.blas.AtA you will find two physical operator
> implementations for that. The choice which one to use depends on whether
> there is enough memory to hold certain intermediary results in memory.
> The primary intention of a separation into logical and physical
> operators is to allow for a declarative programming style on the users
> side and for an optimizer on the system side which automatically chooses
> the optimal physical operator for the execution of a specific program.
> This choice of the physical operator might depend on the shape and
> amount of the data processed as well on the underlying available
> resources. *The separation into logical and physical operators clearly
> doesn't imply to have multiple backends*. It only makes it very easy to
> support them.
>> 2. Does the proposal of having a new DSL backend in the future (for e.g
>> stratosphere as suggested elsewhere) make you:
>> -- worry that stratosphere would be a dependency to Mahout?
> Stratosphere has been accepted as a incubator project in the ASF
> recently, so the worry about such a dependency is naturally less than
> about an externally managed project like h20.
>> -- worry that as a user/commiter/contributor you have to worry about a
>> new
>> framework?
> In my eyes, there is a big difference between Spark/Stratosphere and
> h20. Spark and Stratosphere have a clearly defined programming and
> execution model. They execute programs that are composed of a DAG of
> operators. The set of operators has clearly defined semantics and
> parallelization strategies. If you compare their operators, you will
> find that they offer pretty much the same in lightly different flavors.
> For both, there are scientific papers that in detail explain all these
> things.
> I have asked about a detailed description of h20's programming model and
> execution model and I searched the documentation, but I haven't been
> able to find something that clearly describes how things are done. I
> would love to read up on this, but until I'm presented with this, I have
> to assume that such a principled foundation is missing.
> --sebastian

View raw message