beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-1674) Flink user state GC depends on order of timer firing
Date Fri, 10 Mar 2017 10:12:04 GMT


ASF GitHub Bot commented on BEAM-1674:

GitHub user aljoscha opened a pull request:

    [BEAM-1674] Fix Flink State GC

    This is a proper solution, as discussed in the Jira issue. If we merge this we can drop
#2215. (Thanks for quickly providing that PR, though!)
    R: @kennknowles 

You can merge this pull request into a Git repository by running:

    $ git pull jira-1674-fix-flink-gc

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2217
commit bf6d2748c8876a7415290069163625598928f02f
Author: Aljoscha Krettek <>
Date:   2017-03-10T07:29:27Z

    Move GC timer checking to StatefulDoFnRunner.CleanupTimer

commit 1a8e1f7463cbc7c6b5edfe1dbbc98502e5612511
Author: Aljoscha Krettek <>
Date:   2017-03-10T10:07:00Z

    Introduce Flink-specific state GC implementations
    We now set the GC timer for window.maxTimestamp() + 1 to ensure that a
    user timer set for window.maxTimestamp() still has all state.
    This also adds tests for late data dropping and state GC specifically
    for the Flink DoFnOperator.


> Flink user state GC depends on order of timer firing
> ----------------------------------------------------
>                 Key: BEAM-1674
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>    Affects Versions: 0.5.0
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>            Priority: Blocker
>             Fix For: 0.6.0
> The newly added {{ParDoTest.testEventTimeTimerMultipleKeys()}} fails because the {{DoFn}}
sets a timer for {{window.maxTimestamp()}} which also happens to be the GC timer for the user
state. The Flink Runner uses timers to schedule GC, the user-set timer and the GC timer have
a different timer id, so they don't clash. However, if the GC timer is being processed before
the user timer then the user doesn't have a chance to access the state anymore because it
will already be cleared out by the time the user timer is being processed. 

This message was sent by Atlassian JIRA

View raw message