beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingsong Lee (JIRA)" <>
Subject [jira] [Commented] (BEAM-1517) Garbage collect user state in Flink Runner
Date Tue, 21 Feb 2017 13:51:44 GMT


Jingsong Lee commented on BEAM-1517:

Is it appropriate for the user to do the work of GC?
Just like this:
  public void process(
      ProcessContext c,
      BoundedWindow window,
      @StateId(stateId) ValueState<Integer> state,
      @TimerId("GcTimer") Timer timer) {
    Instant maxTimestamp = window.maxTimestamp();
    long allowedLateness = 10 * 1000;
    Instant gcTime =;
    //Can Timer have a getCurrentTime interface?
    Instant currentTime = new Instant();
    if (gcTime.isBefore(currentTime)) {
      c.sideOutput(lateDataTag, c.element());
    } else {
      // user logical
      // ....
  public void gc(
      OnTimerContext context,
      @StateId(stateId) ValueState<Integer> state) {

> Garbage collect user state in Flink Runner
> ------------------------------------------
>                 Key: BEAM-1517
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
> User facing state/timers in Beam are bound to the key/window of the data. Right now,
the Flink Runner does not clean up user state when the watermark passes the GC horizon for
the state associated with a given window.
> Neither {{StateInternals}} nor the Flink state API support discarding state for a whole
namespace (which is the window in this case) so we might have to manually set a GC timer for
each window/key combination, as is done in the {{ReduceFnRunner}}. For this we have to know
all states a user can possibly use, which we can get from the {{DoFn}} signature.

This message was sent by Atlassian JIRA

View raw message