flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Cai <h...@pinterest.com>
Subject Re: streaming join implementation
Date Thu, 14 Apr 2016 06:40:05 GMT
Thanks Balaji.  Do you mean you spill the non-matching records after 5
minutes into redis?  Does flink give you control on which records is not
matching in the current window such that you can copy into a long-term
storage?



On Wed, Apr 13, 2016 at 11:20 PM, Balaji Rajagopalan <
balaji.rajagopalan@olacabs.com> wrote:

> You can implement join in flink (which is a inner join) the below
> mentioned pseudo code . The below join is for a 5 minute interval, yes will
> be some corners cases when the data coming after 5 minutes will be  missed
> out in the join window, I actually had solved this problem but storing some
> data in redis and wrote correlation logic to take care of the corner cases
> that were missed out in the join  window.
>
> val output: DataStream[(OutputData)] = stream1.join(stream2).where(_.key1).equalTo(_.key2).
>   window(TumblingEventTimeWindows.of(Time.of(5, TimeUnit.MINUTE))).apply(new SomeJoinFunction)
>
>
> On Thu, Apr 14, 2016 at 10:02 AM, Henry Cai <hcai@pinterest.com> wrote:
>
>> Hi,
>>
>> We are evaluating different streaming platforms.  For a typical join
>> between two streams
>>
>> select a.*, b.*
>> FROM a, b
>> ON a.id == b.id
>>
>> How does flink implement the join?  The matching record from either
>> stream can come late, we consider it's a valid join as long as the event
>> time for record a and b are in the same day.
>>
>> I think some streaming platform (e.g. google data flow) will store the
>> records from both streams in a K/V lookup store and later do the lookup.
>> Is this how flink implement the streaming join?
>>
>> If we need to store all the records in a state store, that's going to be
>> a lots of records for a day.
>>
>>
>

Mime
View raw message