beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aljoscha Krettek (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1346) Drop Late Data in ReduceFnRunner
Date Mon, 13 Feb 2017 14:05:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863721#comment-15863721
] 

Aljoscha Krettek commented on BEAM-1346:
----------------------------------------

[~kenn] another thing that crossed my mind is elements being pushed back due to their side
input not being ready. Think {{PushbackSideInputRunner}} and similar implementations for other
runners, if they have it. It's similar to this issue but in the end we probably need a separate
issue.

The problem occurs when you have a special implementation for "combine" that doesn't simply
do {{GroupByKey | ParDo(CombineFn)}} where the first one is {{GroupByKey: KV<K, V> →
KV<K, List<V>>}}. The {{CombineFn}} can access side inputs and the side input
that it can access is determined by the window that the value has after merging (as evident
from the proper definition of combine given above). {{PushbackSideInputRunner}}, however,
only considers the (proto-)window that the value has before merging so the pushing back and
determining when a side input is ready is based on the wrong information.

Do you agree or is that just me getting a little paranoid with the whole merging stuff? ;-)

> Drop Late Data in ReduceFnRunner
> --------------------------------
>
>                 Key: BEAM-1346
>                 URL: https://issues.apache.org/jira/browse/BEAM-1346
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-core
>    Affects Versions: 0.5.0
>            Reporter: Aljoscha Krettek
>
> I think these two commits recently broke late-data dropping for the Flink Runner (and
maybe for other runners as well):
> - https://github.com/apache/beam/commit/2b26ec8
> - https://github.com/apache/beam/commit/8989473
> It boils down to the {{LateDataDroppingDoFnRunner}} not being used anymore  because {{DoFnRunners.lateDataDroppingRunner()}}
is not called anymore when a {{DoFn}} is a {{ReduceFnExecutor}} (because that interface was
removed).
> Maybe we should think about dropping late data in another place, my suggestion is {{ReduceFnRunner}}
but that's open for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message