flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Cross product of datastream and dataset
Date Fri, 18 Nov 2016 08:40:36 GMT

it is not possible to mix the DataSet and DataStream APIs at the moment.

If the DataSet is constant and not too big (which I assume, since otherwise
crossing would be extremely expensive), you can load the data into a
stateful MapFunction.
For that you can implement a RichFlatMapFunction and read the data in
open(). For each incoming record, i.e., each call of map(), you cross it
with the records in the state and immediately evaluate the condition and
count. That way you don't generate too many records.

If your DataSet is slowly changing, you can think of using a stateful
CoFlatmapFunction and use on input to read the stream and the other to
update the dataset.

Hope this helps,

2016-11-17 22:36 GMT+01:00 Charlie Moad <charlie.moad@geofeedia.com>:

> We're having trouble mapping our problem to Flink.
> - For each incoming item
> - Generate tuples of the item crossed with a data set
> - Filter the tuples based on a condition
> - Know the count of matching tuples
> This seems to be mashup of DataStream and DataSet, but it appears you
> can't operate with those together. We are also wondering if kicking off a
> batch job for each incoming item is a feasible approach. We don't have time
> window constraints.
> Any recommendations would be greatly appreciated.
> - Charlie

View raw message