Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AEF37200B95 for ; Tue, 13 Sep 2016 02:40:44 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id AD857160AC8; Tue, 13 Sep 2016 00:40:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2B402160AD9 for ; Tue, 13 Sep 2016 02:40:43 +0200 (CEST) Received: (qmail 2098 invoked by uid 500); 13 Sep 2016 00:40:42 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 1992 invoked by uid 99); 13 Sep 2016 00:40:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2016 00:40:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 3A3521806D1 for ; Tue, 13 Sep 2016 00:40:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.646 X-Spam-Level: X-Spam-Status: No, score=-4.646 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Dg2a9wi5B-Pj for ; Tue, 13 Sep 2016 00:40:35 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with SMTP id E47815FC3D for ; Tue, 13 Sep 2016 00:40:33 +0000 (UTC) Received: (qmail 98641 invoked by uid 99); 13 Sep 2016 00:40:33 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2016 00:40:33 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 0EEF0E0C0A; Tue, 13 Sep 2016 00:40:33 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: dhalperi@apache.org To: commits@beam.incubator.apache.org Date: Tue, 13 Sep 2016 00:41:08 -0000 Message-Id: In-Reply-To: <5d2051da80084a09aedbe0b48aa93047@git.apache.org> References: <5d2051da80084a09aedbe0b48aa93047@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [37/50] [abbrv] incubator-beam git commit: Remove the DataflowRunner instructions from examples archived-at: Tue, 13 Sep 2016 00:40:44 -0000 Remove the DataflowRunner instructions from examples Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/c92e45dd Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/c92e45dd Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/c92e45dd Branch: refs/heads/gearpump-runner Commit: c92e45dd4019e613a7670e4bb0e1fcc4b7e2c462 Parents: 4bf3a3b Author: Pei He Authored: Thu Aug 25 14:20:30 2016 -0700 Committer: Dan Halperin Committed: Mon Sep 12 17:40:12 2016 -0700 ---------------------------------------------------------------------- .../beam/examples/DebuggingWordCount.java | 16 +++++++------ .../apache/beam/examples/MinimalWordCount.java | 7 +++--- .../apache/beam/examples/WindowedWordCount.java | 22 +++++++---------- .../org/apache/beam/examples/WordCount.java | 22 +++++------------ .../beam/examples/complete/AutoComplete.java | 25 +++++++------------- .../examples/complete/StreamingWordExtract.java | 4 ++-- .../apache/beam/examples/complete/TfIdf.java | 18 +++++--------- .../examples/complete/TopWikipediaSessions.java | 12 ++++------ .../examples/complete/TrafficMaxLaneFlow.java | 4 ++-- .../beam/examples/complete/TrafficRoutes.java | 4 ++-- .../examples/cookbook/BigQueryTornadoes.java | 18 ++++---------- .../cookbook/CombinePerKeyExamples.java | 18 ++++---------- .../examples/cookbook/DatastoreWordCount.java | 17 ++++++------- .../beam/examples/cookbook/DeDupExample.java | 16 ++++++------- .../beam/examples/cookbook/FilterExamples.java | 21 ++++------------ .../beam/examples/cookbook/JoinExamples.java | 18 ++++---------- .../examples/cookbook/MaxPerKeyExamples.java | 19 ++++----------- .../beam/examples/cookbook/TriggerExample.java | 16 ++++++------- 18 files changed, 92 insertions(+), 185 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java b/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java index be3aa41..eb38227 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java +++ b/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java @@ -46,12 +46,12 @@ import org.slf4j.LoggerFactory; * *

Basic concepts, also in the MinimalWordCount and WordCount examples: * Reading text files; counting a PCollection; executing a Pipeline both locally - * and using the Dataflow service; defining DoFns. + * and using a selected runner; defining DoFns. * *

New Concepts: *

  *   1. Logging to Cloud Logging
- *   2. Controlling Dataflow worker log levels
+ *   2. Controlling worker log levels
  *   3. Creating a custom aggregator
  *   4. Testing your Pipeline via PAssert
  * 
@@ -62,12 +62,14 @@ import org.slf4j.LoggerFactory; * } * * - *

To execute this pipeline using the Dataflow service and the additional logging discussed - * below, specify pipeline configuration: + *

To change the runner, specify: + *

{@code
+ *   --runner=YOUR_SELECTED_RUNNER
+ * }
+ * 
+ * + *

To use the additional logging discussed below, specify: *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://YOUR_TEMP_DIRECTORY
- *   --runner=BlockingDataflowRunner
  *   --workerLogLevelOverrides={"org.apache.beam.examples":"DEBUG"}
  * }
  * 
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java b/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java index f28a20c..f772dd5 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java +++ b/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java @@ -66,12 +66,11 @@ public class MinimalWordCount { // In order to run your pipeline, you need to make following runner specific changes: // - // CHANGE 1/3: Select a Beam runner, such as BlockingDataflowRunner - // or FlinkRunner. + // CHANGE 1/3: Select a Beam runner, such as DataflowRunner or FlinkRunner. // CHANGE 2/3: Specify runner-required options. - // For BlockingDataflowRunner, set project and temp location as follows: + // For DataflowRunner, set project and temp location as follows: // DataflowPipelineOptions dataflowOptions = options.as(DataflowPipelineOptions.class); - // dataflowOptions.setRunner(BlockingDataflowRunner.class); + // dataflowOptions.setRunner(DataflowRunner.class); // dataflowOptions.setProject("SET_YOUR_PROJECT_ID_HERE"); // dataflowOptions.setTempLocation("gs://SET_YOUR_BUCKET_NAME_HERE/AND_TEMP_DIRECTORY"); // For FlinkRunner, set the runner as follows. See {@code FlinkPipelineOptions} http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java b/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java index 7af354c..c8bd9d3 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java +++ b/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java @@ -54,7 +54,7 @@ import org.joda.time.Instant; * *

Basic concepts, also in the MinimalWordCount, WordCount, and DebuggingWordCount examples: * Reading text files; counting a PCollection; writing to GCS; executing a Pipeline both locally - * and using the Dataflow service; defining DoFns; creating a custom aggregator; + * and using a selected runner; defining DoFns; creating a custom aggregator; * user-defined PTransforms; defining PipelineOptions. * *

New Concepts: @@ -66,19 +66,13 @@ import org.joda.time.Instant; * 5. Writing to BigQuery * * - *

To execute this pipeline locally, specify general pipeline configuration: + *

By default, the examples will run with the {@code DirectRunner}. + * To change the runner, specify: *

{@code
- *   --project=YOUR_PROJECT_ID
- * }
- * 
- * - *

To execute this pipeline using the Dataflow service, specify pipeline configuration: - *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://YOUR_TEMP_DIRECTORY
- *   --runner=BlockingDataflowRunner
+ *   --runner=YOUR_SELECTED_RUNNER
  * }
  * 
+ * See examples/java/README.md for instructions about how to configure different runners. * *

Optionally specify the input file path via: * {@code --inputFile=gs://INPUT_PATH}, @@ -86,7 +80,7 @@ import org.joda.time.Instant; * *

Specify an output BigQuery dataset and optionally, a table for the output. If you don't * specify the table, one will be created for you using the job name. If you don't specify the - * dataset, a dataset called {@code dataflow-examples} must already exist in your project. + * dataset, a dataset called {@code beam_examples} must already exist in your project. * {@code --bigQueryDataset=YOUR-DATASET --bigQueryTable=YOUR-NEW-TABLE-NAME}. * *

By default, the pipeline will do fixed windowing, on 1-minute windows. You can @@ -190,7 +184,7 @@ public class WindowedWordCount { Pipeline pipeline = Pipeline.create(options); /** - * Concept #1: the Dataflow SDK lets us run the same pipeline with either a bounded or + * Concept #1: the Beam SDK lets us run the same pipeline with either a bounded or * unbounded input source. */ PCollection input = pipeline @@ -229,7 +223,7 @@ public class WindowedWordCount { PipelineResult result = pipeline.run(); - // dataflowUtils will try to cancel the pipeline before the program exists. + // ExampleUtils will try to cancel the pipeline before the program exists. exampleUtils.waitToFinish(result); } } http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/WordCount.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/WordCount.java b/examples/java/src/main/java/org/apache/beam/examples/WordCount.java index 793ee4b..498b069 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/WordCount.java +++ b/examples/java/src/main/java/org/apache/beam/examples/WordCount.java @@ -48,8 +48,8 @@ import org.apache.beam.sdk.values.PCollection; * pipeline, for introduction of additional concepts. * *

For a detailed walkthrough of this example, see - * - * https://cloud.google.com/dataflow/java-sdk/wordcount-example + * + * http://beam.incubator.apache.org/use/walkthroughs/ * * *

Basic concepts, also in the MinimalWordCount example: @@ -66,27 +66,17 @@ import org.apache.beam.sdk.values.PCollection; *

Concept #1: you can execute this pipeline either locally or using the selected runner. * These are now command-line options and not hard-coded as they were in the MinimalWordCount * example. - * To execute this pipeline locally, specify general pipeline configuration: - *

{@code
- *   --project=YOUR_PROJECT_ID
- * }
- * 
- * and a local output file or output prefix on GCS: + * To execute this pipeline locally, specify a local output file or output prefix on GCS: *
{@code
  *   --output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PREFIX]
  * }
* - *

To execute this pipeline using the Dataflow service, specify pipeline configuration: + *

To change the runner, specify: *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://YOUR_TEMP_DIRECTORY
- *   --runner=BlockingDataflowRunner
+ *   --runner=YOUR_SELECTED_RUNNER
  * }
  * 
- * and an output prefix on GCS: - *
{@code
- *   --output=gs://YOUR_OUTPUT_PREFIX
- * }
+ * See examples/java/README.md for instructions about how to configure different runners. * *

The input file defaults to {@code gs://apache-beam-samples/shakespeare/kinglear.txt} * and can be overridden with {@code --inputFile}. http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/complete/AutoComplete.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/AutoComplete.java b/examples/java/src/main/java/org/apache/beam/examples/complete/AutoComplete.java index 2182e6d..c3ac614 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/complete/AutoComplete.java +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/AutoComplete.java @@ -77,24 +77,17 @@ import org.joda.time.Duration; *

Concepts: Using the same pipeline in both streaming and batch, combiners, * composite transforms. * - *

To execute this pipeline using the Dataflow service in batch mode, - * specify pipeline configuration: + *

To execute this pipeline in streaming mode, specify: *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://YOUR_TEMP_DIRECTORY
- *   --runner=DataflowRunner
- *   --inputFile=gs://path/to/input*.txt
+ *   --streaming
  * }
* - *

To execute this pipeline using the Dataflow service in streaming mode, - * specify pipeline configuration: + *

To change the runner, specify: *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://YOUR_TEMP_DIRECTORY
- *   --runner=DataflowRunner
- *   --inputFile=gs://YOUR_INPUT_DIRECTORY/*.txt
- *   --streaming
- * }
+ * --runner=YOUR_SELECTED_RUNNER + * } + * + * See examples/java/README.md for instructions about how to configure different runners. * *

This will update the Cloud Datastore every 10 seconds based on the last * 30 minutes of data received. @@ -417,7 +410,7 @@ public class AutoComplete { /** * Options supported by this class. * - *

Inherits standard Dataflow configuration options. + *

Inherits standard Beam example configuration options. */ private static interface Options extends ExampleOptions, ExampleBigQueryTableOptions, StreamingOptions { @@ -510,7 +503,7 @@ public class AutoComplete { // Run the pipeline. PipelineResult result = p.run(); - // dataflowUtils will try to cancel the pipeline and the injector before the program exists. + // ExampleUtils will try to cancel the pipeline and the injector before the program exists. exampleUtils.waitToFinish(result); } } http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/complete/StreamingWordExtract.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/StreamingWordExtract.java b/examples/java/src/main/java/org/apache/beam/examples/complete/StreamingWordExtract.java index 869ea69..e8d8950 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/complete/StreamingWordExtract.java +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/StreamingWordExtract.java @@ -44,7 +44,7 @@ import org.apache.beam.sdk.transforms.ParDo; * a BigQuery table. * *

The example is configured to use the default BigQuery table from the example common package - * (there are no defaults for a general Dataflow pipeline). + * (there are no defaults for a general Beam pipeline). * You can override them by using the {@literal --bigQueryDataset}, and {@literal --bigQueryTable} * options. If the BigQuery table do not exist, the example will try to create them. * @@ -141,7 +141,7 @@ public class StreamingWordExtract { PipelineResult result = pipeline.run(); - // dataflowUtils will try to cancel the pipeline before the program exists. + // ExampleUtils will try to cancel the pipeline before the program exists. exampleUtils.waitToFinish(result); } } http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/complete/TfIdf.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/TfIdf.java b/examples/java/src/main/java/org/apache/beam/examples/complete/TfIdf.java index 6684553..59bbd49 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/complete/TfIdf.java +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/TfIdf.java @@ -65,23 +65,17 @@ import org.slf4j.LoggerFactory; * *

Concepts: joining data; side inputs; logging * - *

To execute this pipeline locally, specify general pipeline configuration: - *

{@code
- *   --project=YOUR_PROJECT_ID
- * }
- * and a local output file or output prefix on GCS: + *

To execute this pipeline locally, specify a local output file or output prefix on GCS: *

{@code
  *   --output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PREFIX]
  * }
* - *

To execute this pipeline using the Dataflow service, specify pipeline configuration: + *

To change the runner, specify: *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://YOUR_TEMP_DIRECTORY
- *   --runner=BlockingDataflowRunner
- * and an output prefix on GCS:
- *   --output=gs://YOUR_OUTPUT_PREFIX
- * }
+ * --runner=YOUR_SELECTED_RUNNER + * } + * + * See examples/java/README.md for instructions about how to configure different runners. * *

The default input is {@code gs://apache-beam-samples/shakespeare/} and can be overridden with * {@code --input}. http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java b/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java index d597258..0f594d7 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java @@ -52,17 +52,13 @@ import org.joda.time.Instant; *

It is not recommended to execute this pipeline locally, given the size of the default input * data. * - *

To execute this pipeline using the Dataflow service, specify pipeline configuration: + *

To execute this pipeline using a selected runner and an output prefix on GCS, specify: *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://YOUR_TEMP_DIRECTORY
- *   --runner=BlockingDataflowRunner
+ *   --runner=YOUR_SELECTED_RUNNER
+ *   --output=gs://YOUR_OUTPUT_PREFIX
  * }
  * 
- * and an output prefix on GCS: - *
{@code
- *   --output=gs://YOUR_OUTPUT_PREFIX
- * }
+ * See examples/java/README.md for instructions about how to configure different runners. * *

The default input is {@code gs://apache-beam-samples/wikipedia_edits/*.json} and can be * overridden with {@code --input}. http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/complete/TrafficMaxLaneFlow.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/TrafficMaxLaneFlow.java b/examples/java/src/main/java/org/apache/beam/examples/complete/TrafficMaxLaneFlow.java index e456960..0c367d4 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/complete/TrafficMaxLaneFlow.java +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/TrafficMaxLaneFlow.java @@ -66,7 +66,7 @@ import org.joda.time.format.DateTimeFormatter; *

The pipeline reads traffic sensor data from {@literal --inputFile}. * *

The example is configured to use the default BigQuery table from the example common package - * (there are no defaults for a general Dataflow pipeline). + * (there are no defaults for a general Beam pipeline). * You can override them by using the {@literal --bigQueryDataset}, and {@literal --bigQueryTable} * options. If the BigQuery table do not exist, the example will try to create them. * @@ -354,7 +354,7 @@ public class TrafficMaxLaneFlow { // Run the pipeline. PipelineResult result = pipeline.run(); - // dataflowUtils will try to cancel the pipeline and the injector before the program exists. + // ExampleUtils will try to cancel the pipeline and the injector before the program exists. exampleUtils.waitToFinish(result); } http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/complete/TrafficRoutes.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/TrafficRoutes.java b/examples/java/src/main/java/org/apache/beam/examples/complete/TrafficRoutes.java index 95336c6..14cee4d 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/complete/TrafficRoutes.java +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/TrafficRoutes.java @@ -69,7 +69,7 @@ import org.joda.time.format.DateTimeFormatter; *

The pipeline reads traffic sensor data from {@literal --inputFile}. * *

The example is configured to use the default BigQuery table from the example common package - * (there are no defaults for a general Dataflow pipeline). + * (there are no defaults for a general Beam pipeline). * You can override them by using the {@literal --bigQueryDataset}, and {@literal --bigQueryTable} * options. If the BigQuery table do not exist, the example will try to create them. * @@ -365,7 +365,7 @@ public class TrafficRoutes { // Run the pipeline. PipelineResult result = pipeline.run(); - // dataflowUtils will try to cancel the pipeline and the injector before the program exists. + // ExampleUtils will try to cancel the pipeline and the injector before the program exists. exampleUtils.waitToFinish(result); } http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java b/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java index 439cf02..1e4918d 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java +++ b/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java @@ -45,27 +45,17 @@ import org.apache.beam.sdk.values.PCollection; *

Note: Before running this example, you must create a BigQuery dataset to contain your output * table. * - *

To execute this pipeline locally, specify general pipeline configuration: - *

{@code
- *   --project=YOUR_PROJECT_ID
- * }
- * 
- * and the BigQuery table for the output, with the form + *

To execute this pipeline locally, specify the BigQuery table for the output with the form: *

{@code
  *   --output=YOUR_PROJECT_ID:DATASET_ID.TABLE_ID
  * }
* - *

To execute this pipeline using the Dataflow service, specify pipeline configuration: + *

To change the runner, specify: *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://YOUR_TEMP_DIRECTORY
- *   --runner=BlockingDataflowRunner
+ *   --runner=YOUR_SELECTED_RUNNER
  * }
  * 
- * and the BigQuery table for the output: - *
{@code
- *   --output=YOUR_PROJECT_ID:DATASET_ID.TABLE_ID
- * }
+ * See examples/java/README.md for instructions about how to configure different runners. * *

The BigQuery input table defaults to {@code clouddataflow-readonly:samples.weather_stations} * and can be overridden with {@code --input}. http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/cookbook/CombinePerKeyExamples.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/CombinePerKeyExamples.java b/examples/java/src/main/java/org/apache/beam/examples/cookbook/CombinePerKeyExamples.java index 1d280a6..fc11ac9 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/cookbook/CombinePerKeyExamples.java +++ b/examples/java/src/main/java/org/apache/beam/examples/cookbook/CombinePerKeyExamples.java @@ -52,27 +52,17 @@ import org.apache.beam.sdk.values.PCollection; *

Note: Before running this example, you must create a BigQuery dataset to contain your output * table. * - *

To execute this pipeline locally, specify general pipeline configuration: - *

{@code
- *   --project=YOUR_PROJECT_ID
- * }
- * 
- * and the BigQuery table for the output: + *

To execute this pipeline locally, specify the BigQuery table for the output: *

{@code
  *   --output=YOUR_PROJECT_ID:DATASET_ID.TABLE_ID
  * }
* - *

To execute this pipeline using the Dataflow service, specify pipeline configuration: + *

To change the runner, specify: *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://
- *   --runner=BlockingDataflowRunner
+ *   --runner=YOUR_SELECTED_RUNNER
  * }
  * 
- * and the BigQuery table for the output: - *
{@code
- *   --output=YOUR_PROJECT_ID:DATASET_ID.TABLE_ID
- * }
+ * See examples/java/README.md for instructions about how to configure different runners. * *

The BigQuery input table defaults to {@code publicdata:samples.shakespeare} and can * be overridden with {@code --input}. http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/cookbook/DatastoreWordCount.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/DatastoreWordCount.java b/examples/java/src/main/java/org/apache/beam/examples/cookbook/DatastoreWordCount.java index 434e9fb..c0066e6 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/cookbook/DatastoreWordCount.java +++ b/examples/java/src/main/java/org/apache/beam/examples/cookbook/DatastoreWordCount.java @@ -58,14 +58,15 @@ import org.apache.beam.sdk.transforms.ParDo; * *

To run this pipeline locally, the following options must be provided: *

{@code
- *   --project=YOUR_PROJECT_ID
  *   --output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PATH]
  * }
* - *

To run this example using Dataflow service, you must additionally - * provide either {@literal --tempLocation} or {@literal --tempLocation}, and - * select one of the Dataflow pipeline runners, eg - * {@literal --runner=BlockingDataflowRunner}. + *

To change the runner, specify: + *

{@code
+ *   --runner=YOUR_SELECTED_RUNNER
+ * }
+ * 
+ * See examples/java/README.md for instructions about how to configure different runners. * *

Note: this example creates entities with Ancestor keys to ensure that all * entities created are in the same entity group. Similarly, the query used to read from the Cloud @@ -239,13 +240,9 @@ public class DatastoreWordCount { } /** - * An example to demo how to use {@link DatastoreIO}. The runner here is - * customizable, which means users could pass either {@code DirectRunner} - * or {@code DataflowRunner} in the pipeline options. + * An example to demo how to use {@link DatastoreIO}. */ public static void main(String args[]) { - // The options are used in two places, for Dataflow service, and - // building DatastoreIO.Read object Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class); if (!options.isReadOnly()) { http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/cookbook/DeDupExample.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/DeDupExample.java b/examples/java/src/main/java/org/apache/beam/examples/cookbook/DeDupExample.java index 5791710..594d52d 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/cookbook/DeDupExample.java +++ b/examples/java/src/main/java/org/apache/beam/examples/cookbook/DeDupExample.java @@ -35,17 +35,15 @@ import org.apache.beam.sdk.util.gcsfs.GcsPath; * Demonstrates {@link org.apache.beam.sdk.io.TextIO.Read}/ * {@link RemoveDuplicates}/{@link org.apache.beam.sdk.io.TextIO.Write}. * - *

To execute this pipeline locally, specify general pipeline configuration: - * --project=YOUR_PROJECT_ID - * and a local output file or output prefix on GCS: + *

To execute this pipeline locally, specify a local output file or output prefix on GCS: * --output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PREFIX] * - *

To execute this pipeline using the Dataflow service, specify pipeline configuration: - * --project=YOUR_PROJECT_ID - * --tempLocation=gs://YOUR_TEMP_DIRECTORY - * --runner=BlockingDataflowRunner - * and an output prefix on GCS: - * --output=gs://YOUR_OUTPUT_PREFIX + *

To change the runner, specify: + *

{@code
+ *   --runner=YOUR_SELECTED_RUNNER
+ * }
+ * 
+ * See examples/java/README.md for instructions about how to configure different runners. * *

The input defaults to {@code gs://apache-beam-samples/shakespeare/*} and can be * overridden with {@code --input}. http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/cookbook/FilterExamples.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/FilterExamples.java b/examples/java/src/main/java/org/apache/beam/examples/cookbook/FilterExamples.java index 6c42520..01d668b 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/cookbook/FilterExamples.java +++ b/examples/java/src/main/java/org/apache/beam/examples/cookbook/FilterExamples.java @@ -54,12 +54,7 @@ import org.apache.beam.sdk.values.PCollectionView; *

Note: Before running this example, you must create a BigQuery dataset to contain your output * table. * - *

To execute this pipeline locally, specify general pipeline configuration: - *

{@code
- *   --project=YOUR_PROJECT_ID
- * }
- * 
- * and the BigQuery table for the output: + *

To execute this pipeline locally, specify the BigQuery table for the output: *

{@code
  *   --output=YOUR_PROJECT_ID:DATASET_ID.TABLE_ID
  *   [--monthFilter=]
@@ -67,20 +62,12 @@ import org.apache.beam.sdk.values.PCollectionView;
  * 
* where optional parameter {@code --monthFilter} is set to a number 1-12. * - *

To execute this pipeline using the Dataflow service, specify pipeline configuration: + *

To change the runner, specify: *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://YOUR_TEMP_DIRECTORY
- *   --runner=BlockingDataflowRunner
+ *   --runner=YOUR_SELECTED_RUNNER
  * }
  * 
- * and the BigQuery table for the output: - *
{@code
- *   --output=YOUR_PROJECT_ID:DATASET_ID.TABLE_ID
- *   [--monthFilter=]
- * }
- * 
- * where optional parameter {@code --monthFilter} is set to a number 1-12. + * See examples/java/README.md for instructions about how to configure different runners. * *

The BigQuery input table defaults to {@code clouddataflow-readonly:samples.weather_stations} * and can be overridden with {@code --input}. http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/cookbook/JoinExamples.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/JoinExamples.java b/examples/java/src/main/java/org/apache/beam/examples/cookbook/JoinExamples.java index 1b91bf1..799cad3 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/cookbook/JoinExamples.java +++ b/examples/java/src/main/java/org/apache/beam/examples/cookbook/JoinExamples.java @@ -41,27 +41,17 @@ import org.apache.beam.sdk.values.TupleTag; * *

Concepts: Join operation; multiple input sources. * - *

To execute this pipeline locally, specify general pipeline configuration: - *

{@code
- *   --project=YOUR_PROJECT_ID
- * }
- * 
- * and a local output file or output prefix on GCS: + *

To execute this pipeline locally, specify a local output file or output prefix on GCS: *

{@code
  *   --output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PREFIX]
  * }
* - *

To execute this pipeline using the Dataflow service, specify pipeline configuration: + *

To change the runner, specify: *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://YOUR_TEMP_DIRECTORY
- *   --runner=BlockingDataflowRunner
+ *   --runner=YOUR_SELECTED_RUNNER
  * }
  * 
- * and an output prefix on GCS: - *
{@code
- *   --output=gs://YOUR_OUTPUT_PREFIX
- * }
+ * See examples/java/README.md for instructions about how to configure different runners. */ public class JoinExamples { http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/cookbook/MaxPerKeyExamples.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/MaxPerKeyExamples.java b/examples/java/src/main/java/org/apache/beam/examples/cookbook/MaxPerKeyExamples.java index 3772a7b..3a4fa26 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/cookbook/MaxPerKeyExamples.java +++ b/examples/java/src/main/java/org/apache/beam/examples/cookbook/MaxPerKeyExamples.java @@ -46,27 +46,16 @@ import org.apache.beam.sdk.values.PCollection; *

Note: Before running this example, you must create a BigQuery dataset to contain your output * table. * - *

To execute this pipeline locally, specify general pipeline configuration: - *

{@code
- *   --project=YOUR_PROJECT_ID
- * }
- * 
- * and the BigQuery table for the output, with the form + *

To execute this pipeline locally, specify the BigQuery table for the output with the form: *

{@code
  *   --output=YOUR_PROJECT_ID:DATASET_ID.TABLE_ID
  * }
* - *

To execute this pipeline using the Dataflow service, specify pipeline configuration: - *

{@code
- *   --project=YOUR_PROJECT_ID
- *   --tempLocation=gs://YOUR_TEMP_DIRECTORY
- *   --runner=BlockingDataflowRunner
- * }
- * 
- * and the BigQuery table for the output: + *

To change the runner, specify: *

{@code
- *   --output=YOUR_PROJECT_ID:DATASET_ID.TABLE_ID
+ *   --runner=YOUR_SELECTED_RUNNER
  * }
+ * See examples/java/README.md for instructions about how to configure different runners. * *

The BigQuery input table defaults to {@code clouddataflow-readonly:samples.weather_stations } * and can be overridden with {@code --input}. http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c92e45dd/examples/java/src/main/java/org/apache/beam/examples/cookbook/TriggerExample.java ---------------------------------------------------------------------- diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/TriggerExample.java b/examples/java/src/main/java/org/apache/beam/examples/cookbook/TriggerExample.java index 2630541..68d4d32 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/cookbook/TriggerExample.java +++ b/examples/java/src/main/java/org/apache/beam/examples/cookbook/TriggerExample.java @@ -73,15 +73,13 @@ import org.joda.time.Instant; * 4. Combining late data and speculative estimates * * - *

Before running this example, it will be useful to familiarize yourself with Dataflow triggers + *

Before running this example, it will be useful to familiarize yourself with Beam triggers * and understand the concept of 'late data', - * See: - * https://cloud.google.com/dataflow/model/triggers and - * - * https://cloud.google.com/dataflow/model/windowing#Advanced + * See: + * http://beam.incubator.apache.org/use/walkthroughs/ * *

The example is configured to use the default BigQuery table from the example common package - * (there are no defaults for a general Dataflow pipeline). + * (there are no defaults for a general Beam pipeline). * You can override them by using the {@code --bigQueryDataset}, and {@code --bigQueryTable} * options. If the BigQuery table do not exist, the example will try to create them. * @@ -155,7 +153,7 @@ public class TriggerExample { * 5 | 60 | 10:27:20 | 10:27:25 * 5 | 60 | 10:29:00 | 11:11:00 * - *

Dataflow tracks a watermark which records up to what point in event time the data is + *

Beam tracks a watermark which records up to what point in event time the data is * complete. For the purposes of the example, we'll assume the watermark is approximately 15m * behind the current processing time. In practice, the actual value would vary over time based * on the systems knowledge of the current delay and contents of the backlog (data @@ -176,7 +174,7 @@ public class TriggerExample { public PCollectionList apply(PCollection> flowInfo) { // Concept #1: The default triggering behavior - // By default Dataflow uses a trigger which fires when the watermark has passed the end of the + // By default Beam uses a trigger which fires when the watermark has passed the end of the // window. This would be written {@code Repeatedly.forever(AfterWatermark.pastEndOfWindow())}. // The system also defaults to dropping late data -- data which arrives after the watermark @@ -459,7 +457,7 @@ public class TriggerExample { PipelineResult result = pipeline.run(); - // dataflowUtils will try to cancel the pipeline and the injector before the program exits. + // ExampleUtils will try to cancel the pipeline and the injector before the program exits. exampleUtils.waitToFinish(result); }