flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From g...@apache.org
Subject [2/5] flink git commit: [FLINK-7234] [docs] Fix CombineHint documentation
Date Wed, 26 Jul 2017 14:32:07 GMT
[FLINK-7234] [docs] Fix CombineHint documentation

The CombineHint documentation applies to DataSet#reduce not
DataSet#reduceGroup and should also be noted for DataSet#distinct. Also
correct the usage where the CombineHint is set with setCombineHint
rather than alongside the user-defined function parameter.

This closes #4372


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/4a88f658
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/4a88f658
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/4a88f658

Branch: refs/heads/master
Commit: 4a88f6587fdfadd5749188a76e6b38a3585cd31b
Parents: 8695a21
Author: Greg Hogan <code@greghogan.com>
Authored: Wed Jul 19 15:24:20 2017 -0400
Committer: Greg Hogan <code@greghogan.com>
Committed: Wed Jul 26 10:31:20 2017 -0400

----------------------------------------------------------------------
 docs/dev/batch/index.md                         | 24 ++++++++++++--------
 .../operators/base/ReduceOperatorBase.java      |  3 +--
 2 files changed, 16 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/4a88f658/docs/dev/batch/index.md
----------------------------------------------------------------------
diff --git a/docs/dev/batch/index.md b/docs/dev/batch/index.md
index 9a8ce22..7fb84e8 100644
--- a/docs/dev/batch/index.md
+++ b/docs/dev/batch/index.md
@@ -205,12 +205,17 @@ data.filter(new FilterFunction<Integer>() {
       <td><strong>Reduce</strong></td>
       <td>
         <p>Combines a group of elements into a single element by repeatedly combining
two elements
-        into one. Reduce may be applied on a full data set, or on a grouped data set.</p>
+        into one. Reduce may be applied on a full data set or on a grouped data set.</p>
 {% highlight java %}
 data.reduce(new ReduceFunction<Integer> {
   public Integer reduce(Integer a, Integer b) { return a + b; }
 });
 {% endhighlight %}
+        <p>If the reduce was applied to a grouped data set then you can specify the
way that the
+        runtime executes the combine phase of the reduce by supplying a <code>CombineHint</code>
to
+        <code>setCombineHint</code>. The hash-based strategy should be faster
in most cases,
+        especially if the number of different keys is small compared to the number of input
+        elements (eg. 1/10).</p>
       </td>
     </tr>
 
@@ -218,7 +223,7 @@ data.reduce(new ReduceFunction<Integer> {
       <td><strong>ReduceGroup</strong></td>
       <td>
         <p>Combines a group of elements into one or more elements. ReduceGroup may
be applied on a
-        full data set, or on a grouped data set.</p>
+        full data set or on a grouped data set.</p>
 {% highlight java %}
 data.reduceGroup(new GroupReduceFunction<Integer, Integer> {
   public void reduce(Iterable<Integer> values, Collector<Integer> out) {
@@ -230,10 +235,6 @@ data.reduceGroup(new GroupReduceFunction<Integer, Integer> {
   }
 });
 {% endhighlight %}
-        <p>If the reduce was applied to a grouped data set, you can specify the way
that the
-        runtime executes the combine phase of the reduce via supplying a CombineHint as a
second
-        parameter. The hash-based strategy should be faster in most cases, especially if
the
-        number of different keys is small compared to the number of input elements (eg. 1/10).</p>
       </td>
     </tr>
 
@@ -260,9 +261,14 @@ DataSet<Tuple3<Integer, String, Double>> output = input.sum(0).andMin(2);
       <td>
         <p>Returns the distinct elements of a data set. It removes the duplicate entries
         from the input DataSet, with respect to all fields of the elements, or a subset of
fields.</p>
-    {% highlight java %}
-        data.distinct();
-    {% endhighlight %}
+{% highlight java %}
+data.distinct();
+{% endhighlight %}
+        <p>Distinct is implemented using a reduce function. You can specify the way
that the
+        runtime executes the combine phase of the reduce by supplying a <code>CombineHint</code>
to
+        <code>setCombineHint</code>. The hash-based strategy should be faster
in most cases,
+        especially if the number of different keys is small compared to the number of input
+        elements (eg. 1/10).</p>
       </td>
     </tr>
 

http://git-wip-us.apache.org/repos/asf/flink/blob/4a88f658/flink-core/src/main/java/org/apache/flink/api/common/operators/base/ReduceOperatorBase.java
----------------------------------------------------------------------
diff --git a/flink-core/src/main/java/org/apache/flink/api/common/operators/base/ReduceOperatorBase.java
b/flink-core/src/main/java/org/apache/flink/api/common/operators/base/ReduceOperatorBase.java
index f97e4d6..f6a7a59 100644
--- a/flink-core/src/main/java/org/apache/flink/api/common/operators/base/ReduceOperatorBase.java
+++ b/flink-core/src/main/java/org/apache/flink/api/common/operators/base/ReduceOperatorBase.java
@@ -79,8 +79,7 @@ public class ReduceOperatorBase<T, FT extends ReduceFunction<T>>
extends SingleI
 		HASH,
 
 		/**
-		 * Disable the use of a combiner. This can be faster in cases when the number of different
keys
-		 * is very small compared to the number of input elements (eg. 1/100).
+		 * Disable the use of a combiner.
 		 */
 		NONE
 	}


Mime
View raw message