flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Neutatz <neut...@googlemail.com>
Subject Problem with ML pipeline
Date Thu, 04 Jun 2015 17:29:07 GMT
Hi,

I have the following use case: I want to to regression for a timeseries
dataset like:

id, x1, x2, ..., xn, y

id = point in time
x = features
y = target value

In the Flink frame work I would map this to a LabeledVector (y,
DenseVector(x)). (I don't want to use the id as a feature)

When I apply finally the predict() method I get a LabeledVector
(y_predicted, DenseVector(x)).

Now my problem is that I would like to plot the predicted target value
according to its time.

What I have to do now is:

a = predictedDataSet.map ( LabeledVector => Tuple2(x,y_p))
b = originalDataSet.map("id, x1, x2, ..., xn, y" => Tuple2(x,id))

a.join(b).where("x").equalTo("x") { (a,b) => (id, y_p)

This is really a cumbersome process for such an simple thing. Is there any
approach which makes this more simple. If not, can we extend the ML API. to
allow ids?

Best regards,
Felix

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message