beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <>
Subject [jira] [Commented] (BEAM-241) Not easy for runners to get late-data dropping
Date Fri, 29 Apr 2016 21:45:13 GMT


Kenneth Knowles commented on BEAM-241:

This is getting into the realm of providing a backend/worker programming model as well. Dropping
of too-late messages is essentially an operation interposed at the appropriate place in a
runner. Which is not to vote against this - it falls right in with `ReduceFnRunner` and the
proposed side-input-awaiting utilities as part of the runners/core package, which we are in
the processing of spinning off.

> Not easy for runners to get late-data dropping
> ----------------------------------------------
>                 Key: BEAM-241
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-core
>            Reporter: Mark Shields
>            Assignee: Frances Perry
> Quite by accident realized the Flink runner delegates to GroupAlsoByWindowViaWindowSetDoFn
for GBK, which in turn delegates to ReduceFnRunner. The latter now assumes no messages will
arrive beyond the 'garbage collection' time of their target window(s).
> The Dataflow runner interposes a LateDataDroppingDoFnRunner into the path so as to drop
those too-late messages. That's done (I think) using DoFnRunners.createDefault.
> I don't think the Flink runner does that.
> We need a nice runner-friendly way of dealing with the too-late data.

This message was sent by Atlassian JIRA

View raw message