apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siyuan Hua <siy...@datatorrent.com>
Subject Re: Create data pump to H2O
Date Tue, 20 Oct 2015 18:37:15 GMT
Hi Sandesh,

This is not supposed to scale up the H2O itself. It's just about a bridge
between h2o and Apex. Nowadays if you want to use apex to prepare the data
for H2O. You have to output data to some file(ex hdfs) And then manually
start h2o to build the model.
With this bridge you can build one pipeline to do the whole thing.


Siyuan

On Tue, Oct 20, 2015 at 10:56 AM, Sandesh Hegde <sandesh@datatorrent.com>
wrote:

> How do you propose to handle the scalability required for H2o model
> creation ?
>
> On Tue, Oct 20, 2015 at 9:58 AM Siyuan Hua <siyuan@datatorrent.com> wrote:
>
> > In ML model training, we discovered a pattern that apex can be used to
> > process raw data to feature data, then H2O takes the feature data into
> it's
> > model train engine to train the model.
> >
> > But there is a gap in between 2 pipelines, I have a proposal that we
> could
> > create some operator which feed the processed data directly into H2O or
> > maybe start a container for H2O and throw data into it. In that way, we
> > could build a continuous online model train pipeline.
> >
> > I've created a jira here https://malhar.atlassian.net/browse/MLHR-1875
> >
> > Feel free to throw any thoughts
> >
> > Best,
> > Siyuan
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message