beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1983) SDF should properly support windowed side inputs
Date Mon, 17 Apr 2017 22:43:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971767#comment-15971767
] 

ASF GitHub Bot commented on BEAM-1983:
--------------------------------------

GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/2556

    [BEAM-1983] Splittable DoFn changes for properly supporting side inputs

    See https://issues.apache.org/jira/browse/BEAM-1983

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam spd-fixes

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/2556.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2556
    
----
commit 30caf69146c4699993469571927ef3809a1e6e2d
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-04-15T23:38:35Z

    Separates side input test and side output test

commit 670b2325c3e2d97a2c34cd318417f982072646fb
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-04-15T23:39:51Z

    ProcessFn remembers more info about its application context

commit 89759ea0778c4bc3b0e9b0edebb7283569321baf
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-04-17T19:25:02Z

    Minor cleanups in ParDoEvaluator

commit cc34d5f3f1dee423c5d6df1845acdf89da80f2e3
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-04-17T21:41:53Z

    Extracts interface from PushbackSideInputDoFnRunner

commit 37cba1531a5249ecb085776259c345638c6d8623
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-04-17T21:52:23Z

    Creates ProcessFnRunner and wires it through ParDoEvaluator

commit b26c6ac9d7004e1e5e986ffa6326aa1565335662
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-04-17T18:28:24Z

    Explodes windows before GBKIKWI
    
    Also
    * Adds a test for windowed side inputs that requires this
      behavior.
    * Adds a test category for SDF with windowed side input.
      Runners should gradually implement this properly. For now
      only direct runner implements this properly.

----


> SDF should properly support windowed side inputs
> ------------------------------------------------
>
>                 Key: BEAM-1983
>                 URL: https://issues.apache.org/jira/browse/BEAM-1983
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-apex, runner-dataflow, runner-direct, runner-flink, sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>
> Currently there is no test coverage for Splittable DoFn + windowed side inputs, especially
when not all of the side input windows are ready.
> Moreover, current implementation of SDF in the direct runner is definitely wrong: it
uses a ParDoEvaluator to run the ProcessFn, and this ParDoEvaluator looks at the wrong windows
to decide which windows are ready and which are not: https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoEvaluator.java#L134
- the WindowedValue in question is a KeyedWorkItem, and they are always in the global window,
but the important windows are windows of elements inside this KWI's elementsIterable().
> The Flink implementation is also wrong in the same way.
> This JIRA is to:
> 1) add test coverage for this case
> 2) implement proper support in all runners
> I believe the easiest way to do 2) is to:
> - make SplittableParDo, in case the DoFn has side inputs, pre-explode windows before
feeding them into GroupByKeyIntoKeyedWorkItems , so that the resulting KWI's have elements
only in a single window
> - tweak runners to look at the proper window, and assert that there's only one window,
while evaluating ProcessFn, in case the DoFn uses side inputs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message