beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1372) OutputTimeFn and Accumulating Mode is Confusing
Date Wed, 01 Feb 2017 22:32:59 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849043#comment-15849043
] 

Kenneth Knowles commented on BEAM-1372:
---------------------------------------

When OutputTimeFn is removed and replaced with an enum, aka no fixed interface, then it will
be natural to tweak random bits of behavior and we can have enums that express broader behavior,
like holding to the MIN of data across all panes, and we could also reject pipelines with
unreasonable combinations.

> OutputTimeFn and Accumulating Mode is Confusing
> -----------------------------------------------
>
>                 Key: BEAM-1372
>                 URL: https://issues.apache.org/jira/browse/BEAM-1372
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model
>            Reporter: Thomas Groh
>
> See [here| https://github.com/tgroh/beam/commit/2238df334a368ce1a41e14ee616be954c5430c73]
for an example pipeline
> The Timestamp used by a pane does not change based on the accumulation mode of the windowing
strategy - as a result, elements which have associated timestamps can not be safely reassigned
to those timestamps after a GroupByKey if more than one pane could have been produced, regardless
of the {{OutputTimeFn}}. The first example pipeline demonstrates two PCollections where the
elements within the last PCollection cannot be reassigned to their timestamps, even though
we are using {{OutputTimeFn#outputAtEarliestInputTimestamp}} and 
> When using a more complex windowing strategy like sessions, this is even more confusing
- a session that spans more than one of the downstream windows but that is produced in multiple
panes will over time be assigned to later and later windows as more panes are produced - thus,
a pipeline that produces session windows and wishes to group the sessions by the point at
which they started must only ever produce a single pane per session.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message