beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <>
Subject [jira] [Commented] (BEAM-1517) Garbage collect user state in Flink Runner
Date Tue, 21 Feb 2017 19:03:44 GMT


Kenneth Knowles commented on BEAM-1517:

In the runners I've added this GC to (direct and Dataflow) I was faced with the same choice
that you are faced with in Flink:

1. Add the ability to clear a whole {{StateNamespace}} OR
2. Scrape the necessary tags off the {{DoFnSignature}}

I ended up going with 2 both times, because it was a bit easier to get done quickly. If your
underlying storage already supports 1, then you would use the same sort of timer set up but
just have code to write for cleanup. Since it seems pretty common that namespace clearing
is not "for free" in different runners, I would not make it required as part of {{StateInternals}}.
But I do think it seems nice to add to runner/core-java the logic for "clear all state in
this {{DoFnSignature}}".

> Garbage collect user state in Flink Runner
> ------------------------------------------
>                 Key: BEAM-1517
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
> User facing state/timers in Beam are bound to the key/window of the data. Right now,
the Flink Runner does not clean up user state when the watermark passes the GC horizon for
the state associated with a given window.
> Neither {{StateInternals}} nor the Flink state API support discarding state for a whole
namespace (which is the window in this case) so we might have to manually set a GC timer for
each window/key combination, as is done in the {{ReduceFnRunner}}. For this we have to know
all states a user can possibly use, which we can get from the {{DoFn}} signature.

This message was sent by Atlassian JIRA

View raw message