mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: 0xdata interested in contributing
Date Thu, 13 Mar 2014 16:37:35 GMT
On Thu, Mar 13, 2014 at 2:10 AM, Sebastian Schelter <ssc@apache.org> wrote:

> @All
>
> I have one big question regarding h2o (maybe SriSatish can help me with
> that). I haven't been able to find a detailed writeup about the execution
> model yet, but on first sight it seems like a big aggregation tree to me:
> Data is partitioned, then operations are conducted independently on the
> partitions (e.g. gradients are computed) and the outputs are aggregated
> (e.g. summed up) and sent back to the individual machines. It also seems to
> support a lightweight version of MapReduce. I think this approach is fine
> for ML algorithms that can be efficiently formulated by the statistical
> query model [2]. A lot of other algos like SSVD or Cooccurrence Analysis or
> graph-based computations are hard to fit into this model however.
>

I worked through cooccurrence analysis including down-sampling with Cliff
from 0xdata and he was able to show me pretty convincingly that h2o is able
to do these computations.

The proof is in the pudding, I think.  The 0xdata team think that they can
knock out a Mahout matrix and vector data type pretty quickly.  They also
think that the SSVD algorithm will follow from that pretty
straightforwardly.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message