flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3179) Combiner is not injected if Reduce or GroupReduce input is explicitly partitioned
Date Fri, 18 Mar 2016 13:40:33 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201482#comment-15201482

ASF GitHub Bot commented on FLINK-3179:

Github user fhueske commented on the pull request:

    Hi @ramkrish86, I thought about this PR and came to the conclusion that we should not
continue. The optimizer's design does not allow to modify operators in or inject operators
into enumerated subplans. This might cause invalid execution plans and in worst case wrong
results without somebody noticing it.
    I would simply log a WARN message that a combiner was not added if the optimizer identifies
a Partition operator in front of a Reduce or combinable GroupReduce operator and give a hint
that an explicit CombinerFunction can be added with groupCombine in front of the partition
    Sorry again @ramkrish86 that I lead you into a dead end with this PR.

> Combiner is not injected if Reduce or GroupReduce input is explicitly partitioned
> ---------------------------------------------------------------------------------
>                 Key: FLINK-3179
>                 URL: https://issues.apache.org/jira/browse/FLINK-3179
>             Project: Flink
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 0.10.1
>            Reporter: Fabian Hueske
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 1.0.0, 0.10.2
> The optimizer does not inject a combiner if the input of a Reducer or GroupReducer is
explicitly partitioned as in the following example
> {code}
> DataSet<Tuple2<String,Integer>> words = ...
> DataSet<Tuple2<String,Integer>> counts = words
>   .partitionByHash(0)
>   .groupBy(0)
>   .sum(1);
> {code}
> Explicit partitioning can be useful to enforce partitioning on a subset of keys or to
use a different partitioning method (custom or range partitioning).
> This issue should be fixed by changing the {{instantiate()}} methods of the {{ReduceProperties}}
and {{GroupReduceWithCombineProperties}} classes such that a combine is injected in front
of a {{PartitionPlanNode}} if it is the input of a Reduce or GroupReduce operator. This should
only happen, if the Reducer is the only successor of the Partition operator.

This message was sent by Atlassian JIRA

View raw message