flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Cai <h...@pinterest.com>
Subject Re: join performance
Date Fri, 29 Apr 2016 17:18:36 GMT
So is the window defined as hour-window or second-window?

If I am using hour-window, I guess I need to modify the trigger to fire
early (e.g. every minute)?  But I don't want to repeatedly emit the same
joined records for every minute (i.e. on 2nd minute, I only want to emit
the changes introduced by new coming records between 1st and 2nd minute)

If I am using second-window, I wasn't sure why the record will still be put
into the correct window based on hour gap?

Another question is on which type of window, I need to match record a from
stream a to record b in stream b if abs(a.time - b.time) < 1-hour, so it's
not really a tumbling window on absolute wall clock, is this a session

On Fri, Apr 29, 2016 at 4:36 AM, Aljoscha Krettek <aljoscha@apache.org>

> Hi,
> you are right, everything will be emitted in a huge burst at the end of
> the hour. If you want to experiment a bit you can write a custom Trigger
> based on EventTimeTrigger that will delay firing of windows. You would
> change onEventTime() to not fire but instead register a processing-time
> timer at a random point in the future. Then, in onProcessingTime() you
> would trigger the actual window processing. Elements will still be put into
> the correct windows based on event time, just the firing of the windows
> will change by doing this.
> Cheers,
> Aljoscha
> On Fri, 29 Apr 2016 at 08:53 Henry Cai <hcai@pinterest.com> wrote:
>> But the join requirement is to match the records from two streams
>> occurring within one hour (besides the normal join key condition), if I use
>> the second join window, those records wouldn't be in the same window any
>> more.
>> On Thu, Apr 28, 2016 at 11:47 PM, Ashutosh Kumar <
>> kmr.ashutosh16@gmail.com> wrote:
>>> Time unit can be in seconds as well. Is there specific need to get
>>> bursts hourly?
>>> On Fri, Apr 29, 2016 at 11:48 AM, Henry Cai <hcai@pinterest.com> wrote:
>>>> For the below standard stream/stream join, does flink store the results
>>>> of stream 1 and stream 2 into state store for the current hour and at the
>>>> end of the hour window it will fire the window by iterating through all
>>>> stored elements in the state store to find join matches?
>>>> My concern is during most of the time in the hour, the output (assuming
>>>> the output is going to another stream) will be idle and on each hour mark
>>>> there will be huge outputs of joined records emitted, any way to make it
>>>> more gradual?
>>>> dataStream.join(otherStream)
>>>>     .where(0).equalTo(1)
>>>>     .window(TumblingEventTimeWindows.of(Time.hours(1)))
>>>>     .apply (new JoinFunction () {...});

View raw message