flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fhueske <...@git.apache.org>
Subject [GitHub] incubator-flink pull request: New operator map partition function
Date Wed, 25 Jun 2014 16:12:23 GMT
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/42#discussion_r14196207
  
    --- Diff: stratosphere-compiler/src/main/java/eu/stratosphere/compiler/dag/MapPartitionNode.java
---
    @@ -0,0 +1,56 @@
    +/***********************************************************************************************************************
    + * Copyright (C) 2010-2013 by the Stratosphere project (http://stratosphere.eu)
    + *
    + * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this
file except in compliance with
    + * the License. You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software distributed under
the License is distributed on
    + * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the License for the
    + * specific language governing permissions and limitations under the License.
    + **********************************************************************************************************************/
    +
    +package eu.stratosphere.compiler.dag;
    +
    +import java.util.Collections;
    +import java.util.List;
    +
    +import eu.stratosphere.api.common.operators.SingleInputOperator;
    +import eu.stratosphere.compiler.DataStatistics;
    +import eu.stratosphere.compiler.operators.MapPartitionDescriptor;
    +import eu.stratosphere.compiler.operators.OperatorDescriptorSingle;
    +
    +/**
    + * The optimizer's internal representation of a <i>MapPartition</i> operator
node.
    + */
    +public class MapPartitionNode extends SingleInputNode {
    +	
    +	/**
    +	 * Creates a new MapNode for the given contract.
    +	 * 
    +	 * @param operator The map partition contract object.
    +	 */
    +	public MapPartitionNode(SingleInputOperator<?, ?, ?> operator) {
    +		super(operator);
    +	}
    +
    +	@Override
    +	public String getName() {
    +		return "MapPartition";
    +	}
    +
    +	@Override
    +	protected List<OperatorDescriptorSingle> getPossibleProperties() {
    +		return Collections.<OperatorDescriptorSingle>singletonList(new MapPartitionDescriptor());
    +	}
    +
    +	/**
    +	 * Computes the estimates for the MapPartition operator.
    +	 * We assume that by default, Map takes one value and transforms it into another value.
    +	 * The cardinality consequently stays the same.
    +	 */
    +	@Override
    +	protected void computeOperatorSpecificDefaultEstimates(DataStatistics statistics) {
    +	}
    --- End diff --
    
    I would guess that a PartitionMap is used when multiple values should be somehow (in a
non-deterministic way) aggregated. Otherwise one would (and should!) use a Map- or FlatMapFunction.
    
    An estimate which is based on the input card is not a good idea, IMHO. 
    I would assume that the output card is more likely to be around 1 * DOP, but this is just
a gut feeling ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message