beam-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Beam JIRA Bot (Jira)" <>
Subject [jira] [Commented] (BEAM-3568) Overlapping sessions with zero allowed lateness due to window expiry rules
Date Tue, 18 Aug 2020 17:07:15 GMT


Beam JIRA Bot commented on BEAM-3568:

This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled
"stale-P2". If this issue is still affecting you, we care! Please comment and remove the label.
Otherwise, in 14 days the issue will be moved to P3.

Please see for a detailed explanation
of what these priorities mean.

> Overlapping sessions with zero allowed lateness due to window expiry rules
> --------------------------------------------------------------------------
>                 Key: BEAM-3568
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model, runner-core
>            Reporter: Kenneth Knowles
>            Priority: P2
>              Labels: stale-P2
> Consider this sequence, with session gap durations of 5:
>  - element arrives with timestamp 0, assigned to proto-window [0, 5)
>  - watermark advances to 6, emitting the session and discarding it
>  - element arrives with timestamp 3, assigned to proto-window [3, 8) so it is not dropped
as the window is not expired
>  - watermark advances to 8+, emitting that session
> While "technically correct" according to spec, this seems undesirable. It was introduced
when late data dropping was tied to window expiry. I think either dropping the second element
or including it and emitting a merged window would be OK.
> In the case of sessions, we could just retain the window until it cannot possibly merge
with other non-expired data. Even with allowed lateness zero this is double the gap duration.
The window would be in an interesting state where it would be expired and ineligible for further
output but could still merge and the greater window could be output.
> The challenge is that sessions are just one kind of merging window - the merging logic
has to be assumed opaque. So we cannot simply reason about how sessions work. The other, more
drastic option, is to rethink how late data dropping is defined for merging windows, particularly
in the "proto-window" phase.

This message was sent by Atlassian Jira

View raw message