flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <till.rohrm...@gmail.com>
Subject Re: Problem with ML pipeline
Date Thu, 04 Jun 2015 17:56:46 GMT
I see your problem. One way to solve the problem is to implement a special
PredictOperation which takes a tuple (id, vector) and returns a tuple (id,
labeledVector). You can take a look at the implementation for the vector
prediction operation.

But we can also discuss about adding an ID field to the Vector type.

Cheers,
Till
On Jun 4, 2015 7:30 PM, "Felix Neutatz" <neutatz@googlemail.com> wrote:

> Hi,
>
> I have the following use case: I want to to regression for a timeseries
> dataset like:
>
> id, x1, x2, ..., xn, y
>
> id = point in time
> x = features
> y = target value
>
> In the Flink frame work I would map this to a LabeledVector (y,
> DenseVector(x)). (I don't want to use the id as a feature)
>
> When I apply finally the predict() method I get a LabeledVector
> (y_predicted, DenseVector(x)).
>
> Now my problem is that I would like to plot the predicted target value
> according to its time.
>
> What I have to do now is:
>
> a = predictedDataSet.map ( LabeledVector => Tuple2(x,y_p))
> b = originalDataSet.map("id, x1, x2, ..., xn, y" => Tuple2(x,id))
>
> a.join(b).where("x").equalTo("x") { (a,b) => (id, y_p)
>
> This is really a cumbersome process for such an simple thing. Is there any
> approach which makes this more simple. If not, can we extend the ML API. to
> allow ids?
>
> Best regards,
> Felix
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message