beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1674) Flink user state GC depends on order of timer firing
Date Thu, 09 Mar 2017 18:41:38 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903587#comment-15903587
] 

Kenneth Knowles commented on BEAM-1674:
---------------------------------------

The user needs to get a timer callback late enough that they know they are doing a final flush,
so their timer has to be for GC time.

The actual GC does not have to happen immediately, as long as incoming elements and processing
time timers are discarded so the window is inactive. So you can set the actual GC timer slightly
later (1ms) if you have some guarantee that all eligible timers are delivered and they are
delivered in timestamp order.

In my implementations I simply have completely separate system timer internals and user timer
internals in disjoint namespaces, and I process all the user timers first. It looks like this
is compatible with {{StatefulDoFnRunner}} since all that logic would happen in the code that
builds the {{CleanupTimer}} / {{TimerInternalsCleanupTimer}}.

> Flink user state GC depends on order of timer firing
> ----------------------------------------------------
>
>                 Key: BEAM-1674
>                 URL: https://issues.apache.org/jira/browse/BEAM-1674
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>    Affects Versions: 0.5.0
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>            Priority: Blocker
>             Fix For: 0.6.0
>
>
> The newly added {{ParDoTest.testEventTimeTimerMultipleKeys()}} fails because the {{DoFn}}
sets a timer for {{window.maxTimestamp()}} which also happens to be the GC timer for the user
state. The Flink Runner uses timers to schedule GC, the user-set timer and the GC timer have
a different timer id, so they don't clash. However, if the GC timer is being processed before
the user timer then the user doesn't have a chance to access the state anymore because it
will already be cleared out by the time the user timer is being processed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message