beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2671) CreateStreamTest.testFirstElementLate validatesRunner test fails on Spark runner
Date Tue, 15 Aug 2017 21:52:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127927#comment-16127927
] 

Kenneth Knowles commented on BEAM-2671:
---------------------------------------

[~staslev] previously, timers were each processed separately. The state for a window was cleared
when a timer arrived with timestamp == GC time.  This was actually wrong relative to the spec,
and led to the wrong number of outputs in some situations where the end-of-window and GC timer
came in the same bundle. You'd get multiple outputs (which is OK but wasteful) but they would
be labeled wrong about whether they are the first or final output.

Now, timers for a window's lifecycle are processed all at once. In fact, the timers themselves
are irrelevant. When they arrive, the window is "activated" and the state for the window is
cleared if the watermark is far enough along that the window is expired. All the PaneInfo
is automatically fixed for the corner cases about when the EOW and GC timers come together
or very delayed, etc.

Technically, the GC timer should only ever be delivered when the watermark is that far along,
so the actual GC time is the same.

If a runner was delivering the GC timer early, then it would have worked in the old logic,
but won't GC in the new logic. If a timer comes in with timestamp == GC time but the watermark
is actually not far enough along to safely GC, it will not cause a GC. It would be a bug to
deliver that timer, and could cause other erroneous results due to early clearing of state
- data would come in, would not be dropped, and would then be output. But mostly it would
be rare to see the failure. Now you'd see it all the time.

> CreateStreamTest.testFirstElementLate validatesRunner test fails on Spark runner
> --------------------------------------------------------------------------------
>
>                 Key: BEAM-2671
>                 URL: https://issues.apache.org/jira/browse/BEAM-2671
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Etienne Chauchot
>            Assignee: Jean-Baptiste Onofré
>             Fix For: 2.2.0
>
>
> Error message:
> Flatten.Iterables/FlattenIterables/FlatMap/ParMultiDo(Anonymous).out0: 
> Expected: iterable over [] in any order
>      but: Not matched: "late"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message