Mailing-List: contact commits-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.apache.org
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: uce@apache.org
To: commits@flink.apache.org
Date: Mon, 11 Apr 2016 12:20:17 -0000
Message-Id: <17588ad8c51c4f05bb82cfd53dc3596d@git.apache.org>
Subject: [1/2] flink git commit: [FLINK-3634] [docs] Fix documentation for
 DataSetUtils.zipWithUniqueId()

Repository: flink
Updated Branches:
  refs/heads/master ed1e52a10 -> e16ca8460


[FLINK-3634] [docs] Fix documentation for DataSetUtils.zipWithUniqueId()

This closes #1817.


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/76f95f76
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/76f95f76
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/76f95f76

Branch: refs/heads/master
Commit: 76f95f76af0368c18d9a69136f5293d6d0da843e
Parents: ed1e52a
Author: Greg Hogan <code@greghogan.com>
Authored: Fri Mar 18 10:44:03 2016 -0400
Committer: Ufuk Celebi <uce@apache.org>
Committed: Mon Apr 11 14:19:07 2016 +0200

----------------------------------------------------------------------
 docs/apis/batch/zip_elements_guide.md | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/76f95f76/docs/apis/batch/zip_elements_guide.md
----------------------------------------------------------------------
diff --git a/docs/apis/batch/zip_elements_guide.md b/docs/apis/batch/zip_elements_guide.md
index 8048f1c..59f723a 100644
--- a/docs/apis/batch/zip_elements_guide.md
+++ b/docs/apis/batch/zip_elements_guide.md
@@ -32,15 +32,17 @@ This document shows how {% gh_link /flink-java/src/main/java/org/apache/flink/ap
 {:toc}
 
 ### Zip with a Dense Index
-For assigning consecutive labels to the elements, the `zipWithIndex` method should be called. It receives a data set as input and returns a new data set of unique id, initial value tuples.
+`zipWithIndex` assigns consecutive labels to the elements, receiving a data set as input and returning a new data set of `(unique id, initial value)` 2-tuples.
+This process requires two passes, first counting then labeling elements, and cannot be pipelined due to the synchronization of counts.
+The alternative `zipWIthUniqueId` works in a pipelined fashion and is preferred when a unique labeling is sufficient.
 For example, the following code:
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
 {% highlight java %}
 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-env.setParallelism(1);
-DataSet<String> in = env.fromElements("A", "B", "C", "D", "E", "F");
+env.setParallelism(2);
+DataSet<String> in = env.fromElements("A", "B", "C", "D", "E", "F", "G", "H");
 
 DataSet<Tuple2<Long, String>> result = DataSetUtils.zipWithIndex(in);
 
@@ -54,8 +56,8 @@ env.execute();
 import org.apache.flink.api.scala._
 
 val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
-env.setParallelism(1)
-val input: DataSet[String] = env.fromElements("A", "B", "C", "D", "E", "F")
+env.setParallelism(2)
+val input: DataSet[String] = env.fromElements("A", "B", "C", "D", "E", "F", "G", "H")
 
 val result: DataSet[(Long, String)] = input.zipWithIndex
 
@@ -66,21 +68,21 @@ env.execute()
 
 </div>
 
-will yield the tuples: (0,A), (1,B), (2,C), (3,D), (4,E), (5,F)
+may yield the tuples: (0,G), (1,H), (2,A), (3,B), (4,C), (5,D), (6,E), (7,F)
 
 [Back to top](#top)
 
-### Zip with an Unique Identifier
-In many cases, one may not need to assign consecutive labels.
-`zipWIthUniqueId` works in a pipelined fashion, speeding up the label assignment process. This method receives a data set as input and returns a new data set of unique id, initial value tuples.
+### Zip with a Unique Identifier
+In many cases one may not need to assign consecutive labels.
+`zipWIthUniqueId` works in a pipelined fashion, speeding up the label assignment process. This method receives a data set as input and returns a new data set of `(unique id, initial value)` 2-tuples.
 For example, the following code:
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
 {% highlight java %}
 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-env.setParallelism(1);
-DataSet<String> in = env.fromElements("A", "B", "C", "D", "E", "F");
+env.setParallelism(2);
+DataSet<String> in = env.fromElements("A", "B", "C", "D", "E", "F", "G", "H");
 
 DataSet<Tuple2<Long, String>> result = DataSetUtils.zipWithUniqueId(in);
 
@@ -94,8 +96,8 @@ env.execute();
 import org.apache.flink.api.scala._
 
 val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
-env.setParallelism(1)
-val input: DataSet[String] = env.fromElements("A", "B", "C", "D", "E", "F")
+env.setParallelism(2)
+val input: DataSet[String] = env.fromElements("A", "B", "C", "D", "E", "F", "G", "H")
 
 val result: DataSet[(Long, String)] = input.zipWithUniqueId
 
@@ -106,6 +108,6 @@ env.execute()
 
 </div>
 
-will yield the tuples: (0,A), (2,B), (4,C), (6,D), (8,E), (10,F)
+may yield the tuples: (0,G), (1,A), (2,H), (3,B), (5,C), (7,D), (9,E), (11,F)
 
 [Back to top](#top)