flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Balaji Rajagopalan <balaji.rajagopa...@olacabs.com>
Subject Re: streaming join implementation
Date Thu, 14 Apr 2016 06:20:45 GMT
You can implement join in flink (which is a inner join) the below mentioned
pseudo code . The below join is for a 5 minute interval, yes will be some
corners cases when the data coming after 5 minutes will be  missed out in
the join window, I actually had solved this problem but storing some data
in redis and wrote correlation logic to take care of the corner cases that
were missed out in the join  window.

val output: DataStream[(OutputData)] =
stream1.join(stream2).where(_.key1).equalTo(_.key2).
  window(TumblingEventTimeWindows.of(Time.of(5,
TimeUnit.MINUTE))).apply(new SomeJoinFunction)


On Thu, Apr 14, 2016 at 10:02 AM, Henry Cai <hcai@pinterest.com> wrote:

> Hi,
>
> We are evaluating different streaming platforms.  For a typical join
> between two streams
>
> select a.*, b.*
> FROM a, b
> ON a.id == b.id
>
> How does flink implement the join?  The matching record from either stream
> can come late, we consider it's a valid join as long as the event time for
> record a and b are in the same day.
>
> I think some streaming platform (e.g. google data flow) will store the
> records from both streams in a K/V lookup store and later do the lookup.
> Is this how flink implement the streaming join?
>
> If we need to store all the records in a state store, that's going to be a
> lots of records for a day.
>
>

Mime
View raw message