flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Neutatz <neut...@googlemail.com>
Subject Re: Problem with ML pipeline
Date Sat, 06 Jun 2015 06:13:58 GMT
That would be great. I like the special predict operation better because it
is only in some cases necessary to return the id. The special predict
Operation would save this overhead.

Best regards,
Felix
Am 04.06.2015 7:56 nachm. schrieb "Till Rohrmann" <till.rohrmann@gmail.com>:

> I see your problem. One way to solve the problem is to implement a special
> PredictOperation which takes a tuple (id, vector) and returns a tuple (id,
> labeledVector). You can take a look at the implementation for the vector
> prediction operation.
>
> But we can also discuss about adding an ID field to the Vector type.
>
> Cheers,
> Till
> On Jun 4, 2015 7:30 PM, "Felix Neutatz" <neutatz@googlemail.com> wrote:
>
> > Hi,
> >
> > I have the following use case: I want to to regression for a timeseries
> > dataset like:
> >
> > id, x1, x2, ..., xn, y
> >
> > id = point in time
> > x = features
> > y = target value
> >
> > In the Flink frame work I would map this to a LabeledVector (y,
> > DenseVector(x)). (I don't want to use the id as a feature)
> >
> > When I apply finally the predict() method I get a LabeledVector
> > (y_predicted, DenseVector(x)).
> >
> > Now my problem is that I would like to plot the predicted target value
> > according to its time.
> >
> > What I have to do now is:
> >
> > a = predictedDataSet.map ( LabeledVector => Tuple2(x,y_p))
> > b = originalDataSet.map("id, x1, x2, ..., xn, y" => Tuple2(x,id))
> >
> > a.join(b).where("x").equalTo("x") { (a,b) => (id, y_p)
> >
> > This is really a cumbersome process for such an simple thing. Is there
> any
> > approach which makes this more simple. If not, can we extend the ML API.
> to
> > allow ids?
> >
> > Best regards,
> > Felix
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message