beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Created] (BEAM-1983) SDF should properly support windowed side inputs
Date Fri, 14 Apr 2017 23:41:42 GMT
Eugene Kirpichov created BEAM-1983:

             Summary: SDF should properly support windowed side inputs
                 Key: BEAM-1983
             Project: Beam
          Issue Type: New Feature
          Components: runner-apex, runner-dataflow, runner-direct, runner-flink, sdk-java-core
            Reporter: Eugene Kirpichov
            Assignee: Eugene Kirpichov

Currently there is no test coverage for Splittable DoFn + windowed side inputs, especially
when not all of the side input windows are ready.

Moreover, current implementation of SDF in the direct runner is definitely wrong: it uses
a ParDoEvaluator to run the ProcessFn, and this ParDoEvaluator looks at the wrong windows
to decide which windows are ready and which are not:
- the WindowedValue in question is a KeyedWorkItem, and they are always in the global window,
but the important windows are windows of elements inside this KWI's elementsIterable().

This JIRA is to:
1) add test coverage for this case
2) implement proper support in all runners

I believe the easiest way to do 2) is to:
- make SplittableParDo, in case the DoFn has side inputs, pre-explode windows before feeding
them into GroupByKeyIntoKeyedWorkItems , so that the resulting KWI's have elements only in
a single window
- tweak runners to look at the proper window, and assert that there's only one window, while
evaluating ProcessFn, in case the DoFn uses side inputs

This message was sent by Atlassian JIRA

View raw message