beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From k...@apache.org
Subject [1/2] beam git commit: [BEAM-1771] Clean up dataflow/google references/URLs in examples
Date Thu, 23 Mar 2017 19:25:34 GMT
Repository: beam
Updated Branches:
  refs/heads/master ddc75958c -> 9ca6511a7


[BEAM-1771] Clean up dataflow/google references/URLs in examples


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/81bcbb4d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/81bcbb4d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/81bcbb4d

Branch: refs/heads/master
Commit: 81bcbb4d19843a9f404ed9c8bffb64f66e41fb6d
Parents: 4ffd43e
Author: melissa <melissapa@google.com>
Authored: Mon Mar 20 19:35:22 2017 -0700
Committer: melissa <melissapa@google.com>
Committed: Mon Mar 20 19:49:45 2017 -0700

----------------------------------------------------------------------
 examples/java/README.md                         | 61 +++++++++++---------
 .../beam/examples/DebuggingWordCount.java       |  2 +-
 .../org/apache/beam/examples/cookbook/README.md |  2 +-
 .../beam/examples/complete/game/README.md       |  6 +-
 .../complete/game/injector/Injector.java        |  3 +-
 .../examples/complete/game/GameStatsTest.java   |  2 +-
 .../complete/game/HourlyTeamScoreTest.java      |  2 +-
 7 files changed, 42 insertions(+), 36 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam/blob/81bcbb4d/examples/java/README.md
----------------------------------------------------------------------
diff --git a/examples/java/README.md b/examples/java/README.md
index b58153e..d891fb8 100644
--- a/examples/java/README.md
+++ b/examples/java/README.md
@@ -20,25 +20,23 @@
 # Example Pipelines
 
 The examples included in this module serve to demonstrate the basic
-functionality of Google Cloud Dataflow, and act as starting points for
+functionality of Apache Beam, and act as starting points for
 the development of more complex pipelines.
 
 ## Word Count
 
 A good starting point for new users is our set of
-[word count](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples)
examples, which computes word frequencies.  This series of four successively more detailed
pipelines is described in detail in the accompanying [walkthrough](https://cloud.google.com/dataflow/examples/wordcount-example).
+[word count](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples)
examples, which computes word frequencies.  This series of four successively more detailed
pipelines is described in detail in the accompanying [walkthrough](https://beam.apache.org/get-started/wordcount-example/).
 
-1. [`MinimalWordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java)
is the simplest word count pipeline and introduces basic concepts like [Pipelines](https://cloud.google.com/dataflow/model/pipelines),
-[PCollections](https://cloud.google.com/dataflow/model/pcollection),
-[ParDo](https://cloud.google.com/dataflow/model/par-do),
-and [reading and writing data](https://cloud.google.com/dataflow/model/reading-and-writing-data)
from external storage.
+1. [`MinimalWordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java)
is the simplest word count pipeline and introduces basic concepts like [Pipelines](https://beam.apache.org/documentation/programming-guide/#pipeline),
+[PCollections](https://beam.apache.org/documentation/programming-guide/#pcollection),
+[ParDo](https://beam.apache.org/documentation/programming-guide/#transforms-pardo),
+and [reading and writing data](https://beam.apache.org/documentation/programming-guide/#io)
from external storage.
 
-1. [`WordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java)
introduces Dataflow best practices like [PipelineOptions](https://cloud.google.com/dataflow/pipelines/constructing-your-pipeline#Creating)
and custom [PTransforms](https://cloud.google.com/dataflow/model/composite-transforms).
+1. [`WordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java)
introduces best practices like [PipelineOptions](https://beam.apache.org/documentation/programming-guide/#pipeline)
and custom [PTransforms](https://beam.apache.org/documentation/programming-guide/#transforms-composite).
 
 1. [`DebuggingWordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java)
-shows how to view live aggregators in the [Dataflow Monitoring Interface](https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf),
get the most out of
-[Cloud Logging](https://cloud.google.com/dataflow/pipelines/logging) integration, and start
writing
-[good tests](https://cloud.google.com/dataflow/pipelines/testing-your-pipeline).
+demonstrates some best practices for instrumenting your pipeline code.
 
 1. [`WindowedWordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java)
shows how to run the same pipeline over either unbounded PCollections in streaming mode or
bounded PCollections in batch mode.
 
@@ -50,46 +48,55 @@ Change directory into `examples/java` and run the examples:
     -Dexec.mainClass=<MAIN CLASS> \
     -Dexec.args="<EXAMPLE-SPECIFIC ARGUMENTS>"
 
-For example, you can execute the `WordCount` pipeline on your local machine as follows:
+Alternatively, you may choose to bundle all dependencies into a single JAR and
+execute it outside of the Maven environment.
+
+### Direct Runner
+
+You can execute the `WordCount` pipeline on your local machine as follows:
 
     mvn compile exec:java \
     -Dexec.mainClass=org.apache.beam.examples.WordCount \
     -Dexec.args="--inputFile=<LOCAL INPUT FILE> --output=<LOCAL OUTPUT FILE>"
 
-Once you have followed the general Cloud Dataflow
-[Getting Started](https://cloud.google.com/dataflow/getting-started) instructions, you can
execute
-the same pipeline on fully managed resources in Google Cloud Platform:
+To create the bundled JAR of the examples and execute it locally:
+
+    mvn package
+
+    java -cp examples/java/target/beam-examples-java-bundled-<VERSION>.jar \
+    org.apache.beam.examples.WordCount \
+    --inputFile=<INPUT FILE PATTERN> --output=<OUTPUT FILE>
+
+### Google Cloud Dataflow Runner
+
+After you have followed the general Cloud Dataflow
+[prerequisites and setup](https://beam.apache.org/documentation/runners/dataflow/), you can
execute
+the pipeline on fully managed resources in Google Cloud Platform:
 
     mvn compile exec:java \
     -Dexec.mainClass=org.apache.beam.examples.WordCount \
     -Dexec.args="--project=<YOUR CLOUD PLATFORM PROJECT ID> \
     --tempLocation=<YOUR CLOUD STORAGE LOCATION> \
-    --runner=BlockingDataflowRunner"
+    --runner=DataflowRunner"
 
 Make sure to use your project id, not the project number or the descriptive name.
-The Cloud Storage location should be entered in the form of
+The Google Cloud Storage location should be entered in the form of
 `gs://bucket/path/to/staging/directory`.
 
-Alternatively, you may choose to bundle all dependencies into a single JAR and
-execute it outside of the Maven environment. For example, you can execute the
-following commands to create the
-bundled JAR of the examples and execute it both locally and in Cloud
-Platform:
+To create the bundled JAR of the examples and execute it in Google Cloud Platform:
 
     mvn package
 
     java -cp examples/java/target/beam-examples-java-bundled-<VERSION>.jar \
     org.apache.beam.examples.WordCount \
-    --inputFile=<INPUT FILE PATTERN> --output=<OUTPUT FILE>
-
-    java -cp examples/java/target/beam-examples-java-bundled-<VERSION>.jar \
-    org.apache.beam.examples.WordCount \
     --project=<YOUR CLOUD PLATFORM PROJECT ID> \
     --tempLocation=<YOUR CLOUD STORAGE LOCATION> \
-    --runner=BlockingDataflowRunner
+    --runner=DataflowRunner
+
+## Other Examples
 
 Other examples can be run similarly by replacing the `WordCount` class path with the example
classpath, e.g.
-`org.apache.beam.examples.cookbook.BigQueryTornadoes`,
+`org.apache.beam.examples.cookbook.CombinePerKeyExamples`,
 and adjusting runtime options under the `Dexec.args` parameter, as specified in
 the example itself.
 

http://git-wip-us.apache.org/repos/asf/beam/blob/81bcbb4d/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java
----------------------------------------------------------------------
diff --git a/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java
b/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java
index f7493f3..031f317 100644
--- a/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java
+++ b/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java
@@ -151,7 +151,7 @@ public class DebuggingWordCount {
      * <p>Below we verify that the set of filtered words matches our expected counts.
Note
      * that PAssert does not provide any output and that successful completion of the
      * Pipeline implies that the expectations were met. Learn more at
-     * https://cloud.google.com/dataflow/pipelines/testing-your-pipeline on how to test
+     * https://beam.apache.org/documentation/pipelines/test-your-pipeline/ on how to test
      * your Pipeline and see {@link DebuggingWordCountTest} for an example unit test.
      */
     List<KV<String, Long>> expectedResults = Arrays.asList(

http://git-wip-us.apache.org/repos/asf/beam/blob/81bcbb4d/examples/java/src/main/java/org/apache/beam/examples/cookbook/README.md
----------------------------------------------------------------------
diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/README.md b/examples/java/src/main/java/org/apache/beam/examples/cookbook/README.md
index dab4e73..b167cd7 100644
--- a/examples/java/src/main/java/org/apache/beam/examples/cookbook/README.md
+++ b/examples/java/src/main/java/org/apache/beam/examples/cookbook/README.md
@@ -21,7 +21,7 @@
 
 This directory holds simple "cookbook" examples, which show how to define
 commonly-used data analysis patterns that you would likely incorporate into a
-larger Dataflow pipeline. They include:
+larger Apache Beam pipeline. They include:
 
  <ul>
   <li><a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java">BigQueryTornadoes</a>

http://git-wip-us.apache.org/repos/asf/beam/blob/81bcbb4d/examples/java8/src/main/java/org/apache/beam/examples/complete/game/README.md
----------------------------------------------------------------------
diff --git a/examples/java8/src/main/java/org/apache/beam/examples/complete/game/README.md
b/examples/java8/src/main/java/org/apache/beam/examples/complete/game/README.md
index 25e31f5..fdce05c 100644
--- a/examples/java8/src/main/java/org/apache/beam/examples/complete/game/README.md
+++ b/examples/java8/src/main/java/org/apache/beam/examples/complete/game/README.md
@@ -20,10 +20,10 @@
 # 'Gaming' examples
 
 
-This directory holds a series of example Dataflow pipelines in a simple 'mobile
+This directory holds a series of example Apache Beam pipelines in a simple 'mobile
 gaming' domain. They all require Java 8.  Each pipeline successively introduces
 new concepts, and gives some examples of using Java 8 syntax in constructing
-Dataflow pipelines. Other than usage of Java 8 lambda expressions, the concepts
+Beam pipelines. Other than usage of Java 8 lambda expressions, the concepts
 that are used apply equally well in Java 7.
 
 In the gaming scenario, many users play, as members of different teams, over
@@ -58,7 +58,7 @@ the day's cutoff point.
 
 The next pipeline in the series is `HourlyTeamScore`. This pipeline also
 processes data collected from gaming events in batch. It builds on `UserScore`,
-but uses [fixed windows](https://cloud.google.com/dataflow/model/windowing), by
+but uses [fixed windows](https://beam.apache.org/documentation/programming-guide/#windowing),
by
 default an hour in duration. It calculates the sum of scores per team, for each
 window, optionally allowing specification of two timestamps before and after
 which data is filtered out. This allows a model where late data collected after

http://git-wip-us.apache.org/repos/asf/beam/blob/81bcbb4d/examples/java8/src/main/java/org/apache/beam/examples/complete/game/injector/Injector.java
----------------------------------------------------------------------
diff --git a/examples/java8/src/main/java/org/apache/beam/examples/complete/game/injector/Injector.java
b/examples/java8/src/main/java/org/apache/beam/examples/complete/game/injector/Injector.java
index 8c23cd7..b9a3ff2 100644
--- a/examples/java8/src/main/java/org/apache/beam/examples/complete/game/injector/Injector.java
+++ b/examples/java8/src/main/java/org/apache/beam/examples/complete/game/injector/Injector.java
@@ -260,8 +260,7 @@ class Injector {
       user = team.getRandomUser();
     }
     String event = user + "," + teamName + "," + random.nextInt(MAX_SCORE);
-    // Randomly introduce occasional parse errors. You can see a custom counter tracking
the number
-    // of such errors in the Dataflow Monitoring UI, as the example pipeline runs.
+    // Randomly introduce occasional parse errors.
     if (random.nextInt(parseErrorRate) == 0) {
       System.out.println("Introducing a parse error.");
       event = "THIS LINE REPRESENTS CORRUPT DATA AND WILL CAUSE A PARSE ERROR";

http://git-wip-us.apache.org/repos/asf/beam/blob/81bcbb4d/examples/java8/src/test/java/org/apache/beam/examples/complete/game/GameStatsTest.java
----------------------------------------------------------------------
diff --git a/examples/java8/src/test/java/org/apache/beam/examples/complete/game/GameStatsTest.java
b/examples/java8/src/test/java/org/apache/beam/examples/complete/game/GameStatsTest.java
index da2bb91..36cf9bc 100644
--- a/examples/java8/src/test/java/org/apache/beam/examples/complete/game/GameStatsTest.java
+++ b/examples/java8/src/test/java/org/apache/beam/examples/complete/game/GameStatsTest.java
@@ -38,7 +38,7 @@ import org.junit.runners.JUnit4;
  * Tests of GameStats.
  * Because the pipeline was designed for easy readability and explanations, it lacks good
  * modularity for testing. See our testing documentation for better ideas:
- * https://cloud.google.com/dataflow/pipelines/testing-your-pipeline.
+ * https://beam.apache.org/documentation/pipelines/test-your-pipeline/
  */
 @RunWith(JUnit4.class)
 public class GameStatsTest implements Serializable {

http://git-wip-us.apache.org/repos/asf/beam/blob/81bcbb4d/examples/java8/src/test/java/org/apache/beam/examples/complete/game/HourlyTeamScoreTest.java
----------------------------------------------------------------------
diff --git a/examples/java8/src/test/java/org/apache/beam/examples/complete/game/HourlyTeamScoreTest.java
b/examples/java8/src/test/java/org/apache/beam/examples/complete/game/HourlyTeamScoreTest.java
index 34a0744..5fc94a5 100644
--- a/examples/java8/src/test/java/org/apache/beam/examples/complete/game/HourlyTeamScoreTest.java
+++ b/examples/java8/src/test/java/org/apache/beam/examples/complete/game/HourlyTeamScoreTest.java
@@ -45,7 +45,7 @@ import org.junit.runners.JUnit4;
  * Tests of HourlyTeamScore.
  * Because the pipeline was designed for easy readability and explanations, it lacks good
  * modularity for testing. See our testing documentation for better ideas:
- * https://cloud.google.com/dataflow/pipelines/testing-your-pipeline.
+ * https://beam.apache.org/documentation/pipelines/test-your-pipeline/
  */
 @RunWith(JUnit4.class)
 public class HourlyTeamScoreTest implements Serializable {


Mime
View raw message