Return-Path: X-Original-To: apmail-flink-commits-archive@minotaur.apache.org Delivered-To: apmail-flink-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CA69519CBB for ; Mon, 11 Apr 2016 12:20:17 +0000 (UTC) Received: (qmail 60989 invoked by uid 500); 11 Apr 2016 12:20:17 -0000 Delivered-To: apmail-flink-commits-archive@flink.apache.org Received: (qmail 60938 invoked by uid 500); 11 Apr 2016 12:20:17 -0000 Mailing-List: contact commits-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list commits@flink.apache.org Received: (qmail 60928 invoked by uid 99); 11 Apr 2016 12:20:17 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Apr 2016 12:20:17 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 730ADDFC6F; Mon, 11 Apr 2016 12:20:17 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: uce@apache.org To: commits@flink.apache.org Date: Mon, 11 Apr 2016 12:20:17 -0000 Message-Id: <17588ad8c51c4f05bb82cfd53dc3596d@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [1/2] flink git commit: [FLINK-3634] [docs] Fix documentation for DataSetUtils.zipWithUniqueId() Repository: flink Updated Branches: refs/heads/master ed1e52a10 -> e16ca8460 [FLINK-3634] [docs] Fix documentation for DataSetUtils.zipWithUniqueId() This closes #1817. Project: http://git-wip-us.apache.org/repos/asf/flink/repo Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/76f95f76 Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/76f95f76 Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/76f95f76 Branch: refs/heads/master Commit: 76f95f76af0368c18d9a69136f5293d6d0da843e Parents: ed1e52a Author: Greg Hogan Authored: Fri Mar 18 10:44:03 2016 -0400 Committer: Ufuk Celebi Committed: Mon Apr 11 14:19:07 2016 +0200 ---------------------------------------------------------------------- docs/apis/batch/zip_elements_guide.md | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink/blob/76f95f76/docs/apis/batch/zip_elements_guide.md ---------------------------------------------------------------------- diff --git a/docs/apis/batch/zip_elements_guide.md b/docs/apis/batch/zip_elements_guide.md index 8048f1c..59f723a 100644 --- a/docs/apis/batch/zip_elements_guide.md +++ b/docs/apis/batch/zip_elements_guide.md @@ -32,15 +32,17 @@ This document shows how {% gh_link /flink-java/src/main/java/org/apache/flink/ap {:toc} ### Zip with a Dense Index -For assigning consecutive labels to the elements, the `zipWithIndex` method should be called. It receives a data set as input and returns a new data set of unique id, initial value tuples. +`zipWithIndex` assigns consecutive labels to the elements, receiving a data set as input and returning a new data set of `(unique id, initial value)` 2-tuples. +This process requires two passes, first counting then labeling elements, and cannot be pipelined due to the synchronization of counts. +The alternative `zipWIthUniqueId` works in a pipelined fashion and is preferred when a unique labeling is sufficient. For example, the following code:
{% highlight java %} ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); -env.setParallelism(1); -DataSet in = env.fromElements("A", "B", "C", "D", "E", "F"); +env.setParallelism(2); +DataSet in = env.fromElements("A", "B", "C", "D", "E", "F", "G", "H"); DataSet> result = DataSetUtils.zipWithIndex(in); @@ -54,8 +56,8 @@ env.execute(); import org.apache.flink.api.scala._ val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment -env.setParallelism(1) -val input: DataSet[String] = env.fromElements("A", "B", "C", "D", "E", "F") +env.setParallelism(2) +val input: DataSet[String] = env.fromElements("A", "B", "C", "D", "E", "F", "G", "H") val result: DataSet[(Long, String)] = input.zipWithIndex @@ -66,21 +68,21 @@ env.execute()
-will yield the tuples: (0,A), (1,B), (2,C), (3,D), (4,E), (5,F) +may yield the tuples: (0,G), (1,H), (2,A), (3,B), (4,C), (5,D), (6,E), (7,F) [Back to top](#top) -### Zip with an Unique Identifier -In many cases, one may not need to assign consecutive labels. -`zipWIthUniqueId` works in a pipelined fashion, speeding up the label assignment process. This method receives a data set as input and returns a new data set of unique id, initial value tuples. +### Zip with a Unique Identifier +In many cases one may not need to assign consecutive labels. +`zipWIthUniqueId` works in a pipelined fashion, speeding up the label assignment process. This method receives a data set as input and returns a new data set of `(unique id, initial value)` 2-tuples. For example, the following code:
{% highlight java %} ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); -env.setParallelism(1); -DataSet in = env.fromElements("A", "B", "C", "D", "E", "F"); +env.setParallelism(2); +DataSet in = env.fromElements("A", "B", "C", "D", "E", "F", "G", "H"); DataSet> result = DataSetUtils.zipWithUniqueId(in); @@ -94,8 +96,8 @@ env.execute(); import org.apache.flink.api.scala._ val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment -env.setParallelism(1) -val input: DataSet[String] = env.fromElements("A", "B", "C", "D", "E", "F") +env.setParallelism(2) +val input: DataSet[String] = env.fromElements("A", "B", "C", "D", "E", "F", "G", "H") val result: DataSet[(Long, String)] = input.zipWithUniqueId @@ -106,6 +108,6 @@ env.execute()
-will yield the tuples: (0,A), (2,B), (4,C), (6,D), (8,E), (10,F) +may yield the tuples: (0,G), (1,A), (2,H), (3,B), (5,C), (7,D), (9,E), (11,F) [Back to top](#top)