flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Neutatz <neut...@googlemail.com>
Subject Problem with ML pipeline
Date Thu, 04 Jun 2015 17:29:07 GMT

I have the following use case: I want to to regression for a timeseries
dataset like:

id, x1, x2, ..., xn, y

id = point in time
x = features
y = target value

In the Flink frame work I would map this to a LabeledVector (y,
DenseVector(x)). (I don't want to use the id as a feature)

When I apply finally the predict() method I get a LabeledVector
(y_predicted, DenseVector(x)).

Now my problem is that I would like to plot the predicted target value
according to its time.

What I have to do now is:

a = predictedDataSet.map ( LabeledVector => Tuple2(x,y_p))
b = originalDataSet.map("id, x1, x2, ..., xn, y" => Tuple2(x,id))

a.join(b).where("x").equalTo("x") { (a,b) => (id, y_p)

This is really a cumbersome process for such an simple thing. Is there any
approach which makes this more simple. If not, can we extend the ML API. to
allow ids?

Best regards,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message