flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fhueske <...@git.apache.org>
Subject [GitHub] flink pull request: FLINK-3179 Combiner is not injected if Reduce ...
Date Mon, 29 Feb 2016 09:36:12 GMT
Github user fhueske commented on a diff in the pull request:

    --- Diff: flink-optimizer/src/main/java/org/apache/flink/optimizer/operators/GroupReduceWithCombineProperties.java
    @@ -87,19 +92,39 @@ public DriverStrategy getStrategy() {
     		return DriverStrategy.SORTED_GROUP_REDUCE;
    -	@Override
     	public SingleInputPlanNode instantiate(Channel in, SingleInputNode node) {
     		if (in.getShipStrategy() == ShipStrategyType.FORWARD) {
     			// adjust a sort (changes grouping, so it must be for this driver to combining sort
    -			if (in.getLocalStrategy() == LocalStrategy.SORT) {
    -				if (!in.getLocalStrategyKeys().isValidUnorderedPrefix(this.keys)) {
    -					throw new RuntimeException("Bug: Inconsistent sort for group strategy.");
    +			if(in.getSource().getOptimizerNode() instanceof PartitionNode) {
    +				// Inject a combiner before the partition node
    +				Channel toCombiner = new Channel(in.getSource());
    +				toCombiner.setShipStrategy(ShipStrategyType.FORWARD, DataExchangeMode.PIPELINED);
    +				GroupReduceNode combinerNode = ((GroupReduceNode) node).getCombinerUtilityNode();
    +				combinerNode.setParallelism(in.getSource().getParallelism());
    +				if(toCombiner.getSource().getInputs().iterator().hasNext()) {
    +					Channel source = toCombiner.getSource().getInputs().iterator().next();
    +					// A combiner plan node is created with the map as the input
    +					SingleInputPlanNode combiner = new SingleInputPlanNode(combinerNode, "Combine("+node.getOperator()
    +						.getName()+")", source, DriverStrategy.SORTED_GROUP_COMBINE);
    +					addCombinerNodeData(in, toCombiner, combiner);
    +					Channel combinerChannel = new Channel(combiner);
    +					combinerChannel.setShipStrategy(ShipStrategyType.FORWARD, DataExchangeMode.PIPELINED);
    --- End diff --
    If we have:
    `[Some-Op] --(a)-partition--> [Partition-Op] --(b)-fwd--> [Reduce-Op]`
    then `in` is the `--(b)-fwd-->` channel and `node` is the the `[Reduce-Op]`. 
    The combine operator should be inserted like this:
    `[Some-Op] --(1)-fwd--> [Combine-Op] --(2)-partition--> [Partition-Op] --(3)-fwd-->
    - The channel (1) must be a new forward/pipelined channel.
    - The channel (2) must be new channel with the same shipping and exchange strategies as
channel (a).
    - The channel (3) should be the original channel (b) which is the `in` parameter.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message