flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Neutatz <neut...@googlemail.com>
Subject Re: Problem with ML pipeline
Date Mon, 08 Jun 2015 04:47:16 GMT
Probably we also need it for the other classes of the pipeline as well, in
order to be able to pass the ID through the whole pipeline.

Best regards,
Felix
 Am 06.06.2015 9:46 vorm. schrieb "Till Rohrmann" <trohrmann@apache.org>:

> Then you only have to provide an implicit PredictOperation[SVM, (T, Int),
> (LabeledVector, Int)] value with T <: Vector in the scope where you call
> the predict operation.
> On Jun 6, 2015 8:14 AM, "Felix Neutatz" <neutatz@googlemail.com> wrote:
>
> > That would be great. I like the special predict operation better because
> it
> > is only in some cases necessary to return the id. The special predict
> > Operation would save this overhead.
> >
> > Best regards,
> > Felix
> > Am 04.06.2015 7:56 nachm. schrieb "Till Rohrmann" <
> till.rohrmann@gmail.com
> > >:
> >
> > > I see your problem. One way to solve the problem is to implement a
> > special
> > > PredictOperation which takes a tuple (id, vector) and returns a tuple
> > (id,
> > > labeledVector). You can take a look at the implementation for the
> vector
> > > prediction operation.
> > >
> > > But we can also discuss about adding an ID field to the Vector type.
> > >
> > > Cheers,
> > > Till
> > > On Jun 4, 2015 7:30 PM, "Felix Neutatz" <neutatz@googlemail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I have the following use case: I want to to regression for a
> timeseries
> > > > dataset like:
> > > >
> > > > id, x1, x2, ..., xn, y
> > > >
> > > > id = point in time
> > > > x = features
> > > > y = target value
> > > >
> > > > In the Flink frame work I would map this to a LabeledVector (y,
> > > > DenseVector(x)). (I don't want to use the id as a feature)
> > > >
> > > > When I apply finally the predict() method I get a LabeledVector
> > > > (y_predicted, DenseVector(x)).
> > > >
> > > > Now my problem is that I would like to plot the predicted target
> value
> > > > according to its time.
> > > >
> > > > What I have to do now is:
> > > >
> > > > a = predictedDataSet.map ( LabeledVector => Tuple2(x,y_p))
> > > > b = originalDataSet.map("id, x1, x2, ..., xn, y" => Tuple2(x,id))
> > > >
> > > > a.join(b).where("x").equalTo("x") { (a,b) => (id, y_p)
> > > >
> > > > This is really a cumbersome process for such an simple thing. Is
> there
> > > any
> > > > approach which makes this more simple. If not, can we extend the ML
> > API.
> > > to
> > > > allow ids?
> > > >
> > > > Best regards,
> > > > Felix
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message