beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Shields (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-241) Not easy for runners to get late-data dropping
Date Fri, 29 Apr 2016 21:04:12 GMT
Mark Shields created BEAM-241:
---------------------------------

             Summary: Not easy for runners to get late-data dropping
                 Key: BEAM-241
                 URL: https://issues.apache.org/jira/browse/BEAM-241
             Project: Beam
          Issue Type: Bug
          Components: runner-core
            Reporter: Mark Shields
            Assignee: Frances Perry


Quite by accident realized the Flink runner delegates to GroupAlsoByWindowViaWindowSetDoFn
for GBK, which in turn delegates to ReduceFnRunner. The latter now assumes no messages will
arrive beyond the 'garbage collection' time of their target window(s).

The Dataflow runner interposes a LateDataDroppingDoFnRunner into the path so as to drop those
too-late messages. That's done (I think) using DoFnRunners.createDefault.

I don't think the Flink runner does that.

We need a nice runner-friendly way of dealing with the too-late data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message