apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandesh Hegde <sand...@datatorrent.com>
Subject Re: Create data pump to H2O
Date Tue, 20 Oct 2015 18:46:11 GMT
This feature will be useful only if training can be done at scale. There
may be some models which can be built incrementally, do you know any ?

On Tue, Oct 20, 2015 at 11:37 AM Siyuan Hua <siyuan@datatorrent.com> wrote:

> Hi Sandesh,
>
> This is not supposed to scale up the H2O itself. It's just about a bridge
> between h2o and Apex. Nowadays if you want to use apex to prepare the data
> for H2O. You have to output data to some file(ex hdfs) And then manually
> start h2o to build the model.
> With this bridge you can build one pipeline to do the whole thing.
>
>
> Siyuan
>
> On Tue, Oct 20, 2015 at 10:56 AM, Sandesh Hegde <sandesh@datatorrent.com>
> wrote:
>
> > How do you propose to handle the scalability required for H2o model
> > creation ?
> >
> > On Tue, Oct 20, 2015 at 9:58 AM Siyuan Hua <siyuan@datatorrent.com>
> wrote:
> >
> > > In ML model training, we discovered a pattern that apex can be used to
> > > process raw data to feature data, then H2O takes the feature data into
> > it's
> > > model train engine to train the model.
> > >
> > > But there is a gap in between 2 pipelines, I have a proposal that we
> > could
> > > create some operator which feed the processed data directly into H2O or
> > > maybe start a container for H2O and throw data into it. In that way, we
> > > could build a continuous online model train pipeline.
> > >
> > > I've created a jira here https://malhar.atlassian.net/browse/MLHR-1875
> > >
> > > Feel free to throw any thoughts
> > >
> > > Best,
> > > Siyuan
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message