flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth <srikanth...@gmail.com>
Subject Join DataStream with dimension tables?
Date Thu, 21 Apr 2016 01:05:56 GMT
Hello,

I have a fairly typical streaming use case but not able to figure how to
implement it best in Flink.
I want to join records read from a kafka stream with one(or more) dimension
tables which are saved as flat files.

As per this jira <https://issues.apache.org/jira/browse/FLINK-2320> its not
possible to join DataStream with DataSet.
These tables are too big to do a collect() and join.

It will be good to read these files during startup, do a partitionByHash
and keep it cached.
On the DataStream may be do a keyBy and join.
Is something like this possible?

Srikanth

Mime
View raw message