flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Building several models in parallel
Date Wed, 08 Jul 2015 08:58:17 GMT
Yes it is. But you can still run the calculation in parallel because `fit`
does not trigger the execution of the job graph. It simply builds the data
flow. Only if you call `predict` or collect the weights, it is executed.

Cheers,
Till

On Wed, Jul 8, 2015 at 10:52 AM, Felix Neutatz <neutatz@googlemail.com>
wrote:

> Thanks for the information Till :)
>
> So at the moment the iteration is the only way.
>
> Best regards,
> Felix
>
> 2015-07-08 10:43 GMT+02:00 Till Rohrmann <trohrmann@apache.org>:
>
> > Hi Felix,
> >
> > this is currently not supported by FlinkML. The MultipleLinearRegression
> > algorithm expects a DataSet and not a GroupedDataSet as input. What you
> can
> > do is to extract each group from the original DataSet by using a filter
> > operation. Once you have done this, you can train the linear model on
> each
> > sub part of the DataSet.
> >
> > Cheers,
> > Till
> > ​
> >
> > On Wed, Jul 8, 2015 at 10:37 AM, Felix Neutatz <neutatz@googlemail.com>
> > wrote:
> >
> > > Hi Felix,
> > >
> > > thanks for the idea. But doesn't this mean that I can only train one
> > model
> > > per partition? The thing is, I have way more models than that :(
> > >
> > > Best regards,
> > > Felix
> > >
> > > 2015-07-07 22:37 GMT+02:00 Felix Schüler <fschueler@posteo.de>:
> > >
> > > > Hi Felix!
> > > >
> > > > We had a similar usecase and I trained multiple models on partitions
> of
> > > > my data with mapPartition and the model-parameters (weights) as
> > > > broadcast variable. If I understood broadcast variables in Flink
> > > > correctly, you should end up with one model on each TaskManager.
> > > >
> > > > Does that work?
> > > >
> > > > Felix
> > > >
> > > > Am 07.07.2015 um 17:32 schrieb Felix Neutatz:
> > > > > Hi,
> > > > >
> > > > > at the moment I have a dataset which looks like this:
> > > > >
> > > > > DataSet[model_ID, DataVector] data
> > > > >
> > > > > So what I want to do is group by the model_ID and build for each
> > > model_ID
> > > > > one regression model
> > > > >
> > > > > in pseudo code:
> > > > > data.groupBy(model_ID)
> > > > >         --> MultipleLinearRegression().fit(data_grouped)
> > > > >
> > > > > Is there anyway besides an iteration how to do this at the moment?
> > > > >
> > > > > Thanks for your help,
> > > > >
> > > > > Felix Neutatz
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message