beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From al...@apache.org
Subject [1/2] beam git commit: Remove sdk & example README.md that are covered in web site
Date Tue, 09 May 2017 16:59:57 GMT
Repository: beam
Updated Branches:
  refs/heads/master 9575694ca -> 5e4fd1b95


Remove sdk & example README.md that are covered in web site


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/672a2c40
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/672a2c40
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/672a2c40

Branch: refs/heads/master
Commit: 672a2c400badd8cd33edb9585d2325b9f265c963
Parents: 9575694
Author: Ahmet Altay <altay@google.com>
Authored: Mon May 8 15:26:50 2017 -0700
Committer: Ahmet Altay <altay@google.com>
Committed: Tue May 9 09:59:15 2017 -0700

----------------------------------------------------------------------
 .gitignore                                      |   1 +
 examples/java/README.md                         |  64 +---
 .../beam/examples/complete/game/README.md       | 131 --------
 pom.xml                                         |   2 +
 sdks/python/README.md                           | 298 -------------------
 .../examples/complete/game/README.md            |  69 -----
 6 files changed, 5 insertions(+), 560 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam/blob/672a2c40/.gitignore
----------------------------------------------------------------------
diff --git a/.gitignore b/.gitignore
index 9cfae09..1ecb993 100644
--- a/.gitignore
+++ b/.gitignore
@@ -24,6 +24,7 @@ sdks/python/**/*.so
 sdks/python/**/*.egg
 sdks/python/LICENSE
 sdks/python/NOTICE
+sdks/python/README.md
 
 # Ignore IntelliJ files.
 .idea/

http://git-wip-us.apache.org/repos/asf/beam/blob/672a2c40/examples/java/README.md
----------------------------------------------------------------------
diff --git a/examples/java/README.md b/examples/java/README.md
index d891fb8..75b70dd 100644
--- a/examples/java/README.md
+++ b/examples/java/README.md
@@ -40,69 +40,9 @@ demonstrates some best practices for instrumenting your pipeline code.
 
 1. [`WindowedWordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java)
shows how to run the same pipeline over either unbounded PCollections in streaming mode or
bounded PCollections in batch mode.
 
-## Building and Running
+## Running Examples
 
-Change directory into `examples/java` and run the examples:
-
-    mvn compile exec:java \
-    -Dexec.mainClass=<MAIN CLASS> \
-    -Dexec.args="<EXAMPLE-SPECIFIC ARGUMENTS>"
-
-Alternatively, you may choose to bundle all dependencies into a single JAR and
-execute it outside of the Maven environment.
-
-### Direct Runner
-
-You can execute the `WordCount` pipeline on your local machine as follows:
-
-    mvn compile exec:java \
-    -Dexec.mainClass=org.apache.beam.examples.WordCount \
-    -Dexec.args="--inputFile=<LOCAL INPUT FILE> --output=<LOCAL OUTPUT FILE>"
-
-To create the bundled JAR of the examples and execute it locally:
-
-    mvn package
-
-    java -cp examples/java/target/beam-examples-java-bundled-<VERSION>.jar \
-    org.apache.beam.examples.WordCount \
-    --inputFile=<INPUT FILE PATTERN> --output=<OUTPUT FILE>
-
-### Google Cloud Dataflow Runner
-
-After you have followed the general Cloud Dataflow
-[prerequisites and setup](https://beam.apache.org/documentation/runners/dataflow/), you can
execute
-the pipeline on fully managed resources in Google Cloud Platform:
-
-    mvn compile exec:java \
-    -Dexec.mainClass=org.apache.beam.examples.WordCount \
-    -Dexec.args="--project=<YOUR CLOUD PLATFORM PROJECT ID> \
-    --tempLocation=<YOUR CLOUD STORAGE LOCATION> \
-    --runner=DataflowRunner"
-
-Make sure to use your project id, not the project number or the descriptive name.
-The Google Cloud Storage location should be entered in the form of
-`gs://bucket/path/to/staging/directory`.
-
-To create the bundled JAR of the examples and execute it in Google Cloud Platform:
-
-    mvn package
-
-    java -cp examples/java/target/beam-examples-java-bundled-<VERSION>.jar \
-    org.apache.beam.examples.WordCount \
-    --project=<YOUR CLOUD PLATFORM PROJECT ID> \
-    --tempLocation=<YOUR CLOUD STORAGE LOCATION> \
-    --runner=DataflowRunner
-
-## Other Examples
-
-Other examples can be run similarly by replacing the `WordCount` class path with the example
classpath, e.g.
-`org.apache.beam.examples.cookbook.CombinePerKeyExamples`,
-and adjusting runtime options under the `Dexec.args` parameter, as specified in
-the example itself.
-
-Note that when running Maven on Microsoft Windows platform, backslashes (`\`)
-under the `Dexec.args` parameter should be escaped with another backslash. For
-example, input file pattern of `c:\*.txt` should be entered as `c:\\*.txt`.
+See [Apache Beam WordCount Example](https://beam.apache.org/get-started/wordcount-example/)
for information on running these examples.
 
 ## Beyond Word Count
 

http://git-wip-us.apache.org/repos/asf/beam/blob/672a2c40/examples/java8/src/main/java/org/apache/beam/examples/complete/game/README.md
----------------------------------------------------------------------
diff --git a/examples/java8/src/main/java/org/apache/beam/examples/complete/game/README.md
b/examples/java8/src/main/java/org/apache/beam/examples/complete/game/README.md
deleted file mode 100644
index fdce05c..0000000
--- a/examples/java8/src/main/java/org/apache/beam/examples/complete/game/README.md
+++ /dev/null
@@ -1,131 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
--->
-
-# 'Gaming' examples
-
-
-This directory holds a series of example Apache Beam pipelines in a simple 'mobile
-gaming' domain. They all require Java 8.  Each pipeline successively introduces
-new concepts, and gives some examples of using Java 8 syntax in constructing
-Beam pipelines. Other than usage of Java 8 lambda expressions, the concepts
-that are used apply equally well in Java 7.
-
-In the gaming scenario, many users play, as members of different teams, over
-the course of a day, and their actions are logged for processing. Some of the
-logged game events may be late-arriving, if users play on mobile devices and go
-transiently offline for a period.
-
-The scenario includes not only "regular" users, but "robot users", which have a
-higher click rate than the regular users, and may move from team to team.
-
-The first two pipelines in the series use pre-generated batch data samples. The
-second two pipelines read from a [PubSub](https://cloud.google.com/pubsub/)
-topic input.  For these examples, you will also need to run the
-`injector.Injector` program, which generates and publishes the gaming data to
-PubSub. The javadocs for each pipeline have more detailed information on how to
-run that pipeline.
-
-All of these pipelines write their results to BigQuery table(s).
-
-
-## The pipelines in the 'gaming' series
-
-### UserScore
-
-The first pipeline in the series is `UserScore`. This pipeline does batch
-processing of data collected from gaming events. It calculates the sum of
-scores per user, over an entire batch of gaming data (collected, say, for each
-day). The batch processing will not include any late data that arrives after
-the day's cutoff point.
-
-### HourlyTeamScore
-
-The next pipeline in the series is `HourlyTeamScore`. This pipeline also
-processes data collected from gaming events in batch. It builds on `UserScore`,
-but uses [fixed windows](https://beam.apache.org/documentation/programming-guide/#windowing),
by
-default an hour in duration. It calculates the sum of scores per team, for each
-window, optionally allowing specification of two timestamps before and after
-which data is filtered out. This allows a model where late data collected after
-the intended analysis window can be included in the analysis, and any late-
-arriving data prior to the beginning of the analysis window can be removed as
-well.
-
-By using windowing and adding element timestamps, we can do finer-grained
-analysis than with the `UserScore` pipeline — we're now tracking scores for
-each hour rather than over the course of a whole day. However, our batch
-processing is high-latency, in that we don't get results from plays at the
-beginning of the batch's time period until the complete batch is processed.
-
-### LeaderBoard
-
-The third pipeline in the series is `LeaderBoard`. This pipeline processes an
-unbounded stream of 'game events' from a PubSub topic. The calculation of the
-team scores uses fixed windowing based on event time (the time of the game play
-event), not processing time (the time that an event is processed by the
-pipeline). The pipeline calculates the sum of scores per team, for each window.
-By default, the team scores are calculated using one-hour windows.
-
-In contrast — to demo another windowing option — the user scores are calculated
-using a global window, which periodically (every ten minutes) emits cumulative
-user score sums.
-
-In contrast to the previous pipelines in the series, which used static, finite
-input data, here we're using an unbounded data source, which lets us provide
-_speculative_ results, and allows handling of late data, at much lower latency.
-E.g., we could use the early/speculative results to keep a 'leaderboard'
-updated in near-realtime. Our handling of late data lets us generate correct
-results, e.g. for 'team prizes'. We're now outputing window results as they're
-calculated, giving us much lower latency than with the previous batch examples.
-
-### GameStats
-
-The fourth pipeline in the series is `GameStats`. This pipeline builds
-on the `LeaderBoard` functionality — supporting output of speculative and late
-data — and adds some "business intelligence" analysis: identifying abuse
-detection. The pipeline derives the Mean user score sum for a window, and uses
-that information to identify likely spammers/robots. (The injector is designed
-so that the "robots" have a higher click rate than the "real" users). The robot
-users are then filtered out when calculating the team scores.
-
-Additionally, user sessions are tracked: that is, we find bursts of user
-activity using session windows. Then, the mean session duration information is
-recorded in the context of subsequent fixed windowing. (This could be used to
-tell us what games are giving us greater user retention).
-
-### Running the PubSub Injector
-
-The `LeaderBoard` and `GameStats` example pipelines read unbounded data
-from a PubSub topic.
-
-Use the `injector.Injector` program to generate this data and publish to a
-PubSub topic. See the `Injector`javadocs for more information on how to run the
-injector. Set up the injector before you start one of these pipelines. Then,
-when you start the pipeline, pass as an argument the name of that PubSub topic.
-See the pipeline javadocs for the details.
-
-## Viewing the results in BigQuery
-
-All of the pipelines write their results to BigQuery.  `UserScore` and
-`HourlyTeamScore` each write one table, and `LeaderBoard` and
-`GameStats` each write two. The pipelines have default table names that
-you can override when you start up the pipeline if those tables already exist.
-
-Depending on the windowing intervals defined in a given pipeline, you may have
-to wait for a while (more than an hour) before you start to see results written
-to the BigQuery tables.

http://git-wip-us.apache.org/repos/asf/beam/blob/672a2c40/pom.xml
----------------------------------------------------------------------
diff --git a/pom.xml b/pom.xml
index e3fa7d8..9ca876e 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1543,6 +1543,7 @@
                   <include>**/*.egg-info/</include>
                   <include>**/sdks/python/LICENSE</include>
                   <include>**/sdks/python/NOTICE</include>
+                  <include>**/sdks/python/README.md</include>
                 </includes>
                 <followSymlinks>false</followSymlinks>
               </fileset>
@@ -1680,6 +1681,7 @@
                   <includes>
                     <include>LICENSE</include>
                     <include>NOTICE</include>
+                    <include>README.md</include>
                   </includes>
                 </resource>
               </resources>              

http://git-wip-us.apache.org/repos/asf/beam/blob/672a2c40/sdks/python/README.md
----------------------------------------------------------------------
diff --git a/sdks/python/README.md b/sdks/python/README.md
deleted file mode 100644
index 7836a04..0000000
--- a/sdks/python/README.md
+++ /dev/null
@@ -1,298 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
--->
-# Apache Beam - Python SDK
-
-[Apache Beam](http://beam.apache.org) is a unified model for defining both batch and streaming
data-parallel processing pipelines. Beam provides a set of language-specific SDKs for constructing
pipelines. These pipelines can be executed on distributed processing backends like [Apache
Spark](http://spark.apache.org/), [Apache Flink](http://flink.apache.org), and [Google Cloud
Dataflow](http://cloud.google.com/dataflow).
-
-Apache Beam for Python provides access to Beam capabilities from the Python programming language.
-
-## Table of Contents
-  * [Overview of the Beam Programming Model](#overview-of-the-programming-model)
-  * [Getting Started](#getting-started)
-  * [A Quick Tour of the Source Code](#a-quick-tour-of-the-source-code)
-  * [Simple Examples](#simple-examples)
-      * [Basic pipeline](#basic-pipeline)
-      * [Basic pipeline (with Map)](#basic-pipeline-with-map)
-      * [Basic pipeline (with FlatMap)](#basic-pipeline-with-flatmap)
-      * [Basic pipeline (with FlatMap and yield)](#basic-pipeline-with-flatmap-and-yield)
-      * [Counting words](#counting-words)
-      * [Counting words with GroupByKey](#counting-words-with-groupbykey)
-      * [Type hints](#type-hints)
-      * [BigQuery](#bigquery)
-      * [Combiner examples](#combiner-examples)
-  * [Organizing Your Code](#organizing-your-code)
-  * [Contact Us](#contact-us)
-
-## Overview of the Programming Model
-
-The key concepts of the programming model are:
-
-* PCollection - represents a collection of data, which could be bounded or unbounded in size.
-* PTransform - represents a computation that transforms input PCollections into output
-PCollections.
-* Pipeline - manages a directed acyclic graph of PTransforms and PCollections that is ready
-for execution.
-* Runner - specifies where and how the Pipeline should execute.
-
-For a further, detailed introduction, please read the
-[Beam Programming Model](http://beam.apache.org/documentation/programming-guide).
-
-## Getting Started
-
-See [Apache Beam Python SDK Quickstart](https://beam.apache.org/get-started/quickstart-py/).
-
-## A Quick Tour of the Source Code
-
-With your virtual environment active, you can follow along this tour by running a `pydoc`
server on a local port of your choosing (this example uses port 8888):
-
-```
-pydoc -p 8888
-```
-
-Open your browser and go to
-http://localhost:8888/apache_beam.html
-
-Some interesting classes to navigate to:
-
-* `PCollection`, in file
-[`apache_beam/pvalue.py`](http://localhost:8888/apache_beam.pvalue.html)
-* `PTransform`, in file
-[`apache_beam/transforms/ptransform.py`](http://localhost:8888/apache_beamtransforms.ptransform.html)
-* `FlatMap`, `GroupByKey`, and `Map`, in file
-[`apache_beam/transforms/core.py`](http://localhost:8888/apache_beam.transforms.core.html)
-* combiners, in file
-[`apache_beam/transforms/combiners.py`](http://localhost:8888/apache_beam.transforms.combiners.html)
-
-Make sure you installed the package first. If not, run `python setup.py install`, then run
pydoc with `pydoc -p 8888`.
-
-## Simple Examples
-
-The following examples demonstrate some basic, fundamental concepts for using Apache Beam's
Python SDK. For more detailed examples, Beam provides a [directory of examples](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples)
for Python.
-
-### Basic pipeline
-
-A basic pipeline will take as input an iterable, apply the
-beam.Create `PTransform`, and produce a `PCollection` that can
-be written to a file or modified by further `PTransform`s.
-The `>>` operator is used to label `PTransform`s and
-the `|` operator is used to chain them.
-
-```python
-# Standard imports
-import apache_beam as beam
-# Create a pipeline executing on a direct runner (local, non-cloud).
-p = beam.Pipeline('DirectRunner')
-# Create a PCollection with names and write it to a file.
-(p
- | 'add names' >> beam.Create(['Ann', 'Joe'])
- | 'save' >> beam.io.WriteToText('./names'))
-# Execute the pipeline.
-p.run()
-```
-
-### Basic pipeline (with Map)
-
-The `Map` `PTransform` returns one output per input. It takes a callable that is applied
to each element of the input `PCollection` and returns an element to the output `PCollection`.
-
-```python
-import apache_beam as beam
-p = beam.Pipeline('DirectRunner')
-# Read a file containing names, add a greeting to each name, and write to a file.
-(p
- | 'load names' >> beam.io.ReadFromText('./names')
- | 'add greeting' >> beam.Map(lambda name, msg: '%s, %s!' % (msg, name), 'Hello')
- | 'save' >> beam.io.WriteToText('./greetings'))
-p.run()
-```
-
-### Basic pipeline (with FlatMap)
-
-A `FlatMap` is like a `Map` except its callable returns a (possibly
-empty) iterable of elements for the output `PCollection`.
-
-The `FlatMap` transform returns zero to many output per input. It accepts a callable that
is applied to each element of the input `PCollection` and returns an iterable with zero or
more elements to the output `PCollection`.
-
-```python
-import apache_beam as beam
-p = beam.Pipeline('DirectRunner')
-# Read a file containing names, add two greetings to each name, and write to a file.
-(p
- | 'load names' >> beam.io.ReadFromText('./names')
- | 'add greetings' >> beam.FlatMap(
-    lambda name, messages: ['%s %s!' % (msg, name) for msg in messages],
-    ['Hello', 'Hola'])
- | 'save' >> beam.io.WriteToText('./greetings'))
-p.run()
-```
-
-### Basic pipeline (with FlatMap and yield)
-
-The callable of a `FlatMap` can be a generator, that is,
-a function using `yield`.
-
-```python
-import apache_beam as beam
-p = beam.Pipeline('DirectRunner')
-# Read a file containing names, add two greetings to each name
-# (with FlatMap using a yield generator), and write to a file.
-def add_greetings(name, messages):
-  for msg in messages:
-    yield '%s %s!' % (msg, name)
-
-(p
- | 'load names' >> beam.io.ReadFromText('./names')
- | 'add greetings' >> beam.FlatMap(add_greetings, ['Hello', 'Hola'])
- | 'save' >> beam.io.WriteToText('./greetings'))
-p.run()
-```
-
-### Counting words
-
-This example shows how to read a text file from [Google Cloud Storage](https://cloud.google.com/storage/)
and count its words.
-
-```python
-import re
-import apache_beam as beam
-p = beam.Pipeline('DirectRunner')
-(p
- | 'read' >> beam.io.ReadFromText('gs://dataflow-samples/shakespeare/kinglear.txt')
- | 'split' >> beam.FlatMap(lambda x: re.findall(r'\w+', x))
- | 'count words' >> beam.combiners.Count.PerElement()
- | 'save' >> beam.io.WriteToText('./word_count'))
-p.run()
-```
-
-### Counting words with GroupByKey
-
-This is a somewhat forced example of `GroupByKey` to count words as the previous example
did, but without using `beam.combiners.Count.PerElement`. As shown in the example, you can
use a wildcard to specify the text file source.
-
-```python
-import re
-import apache_beam as beam
-p = beam.Pipeline('DirectRunner')
-class MyCountTransform(beam.PTransform):
-  def expand(self, pcoll):
-    return (pcoll
-            | 'one word' >> beam.Map(lambda word: (word, 1))
-            # GroupByKey accepts a PCollection of (word, 1) elements and
-            # outputs a PCollection of (word, [1, 1, ...])
-            | 'group words' >> beam.GroupByKey()
-            | 'count words' >> beam.Map(lambda (word, counts): (word, len(counts))))
-
-(p
- | 'read' >> beam.io.ReadFromText('./names*')
- | 'split' >> beam.FlatMap(lambda x: re.findall(r'\w+', x))
- | MyCountTransform()
- | 'write' >> beam.io.WriteToText('./word_count'))
-p.run()
-```
-
-### Type hints
-
-In some cases, providing type hints can improve the efficiency
-of the data encoding.
-
-```python
-import apache_beam as beam
-from apache_beam.typehints import typehints
-p = beam.Pipeline('DirectRunner')
-(p
- | 'read' >> beam.io.ReadFromText('./names')
- | 'add types' >> beam.Map(lambda x: (x, 1)).with_output_types(typehints.KV[str, int])
- | 'group words' >> beam.GroupByKey()
- | 'save' >> beam.io.WriteToText('./typed_names'))
-p.run()
-```
-
-### BigQuery
-
-This example reads weather data from a BigQuery table, calculates the number of tornadoes
per month, and writes the results to a table you specify.
-
-```python
-import apache_beam as beam
-project = 'DESTINATION-PROJECT-ID'
-input_table = 'clouddataflow-readonly:samples.weather_stations'
-output_table = 'DESTINATION-DATASET.DESTINATION-TABLE'
-
-p = beam.Pipeline(argv=['--project', project])
-(p
- | 'read' >> beam.Read(beam.io.BigQuerySource(input_table))
- | 'months with tornadoes' >> beam.FlatMap(
-    lambda row: [(int(row['month']), 1)] if row['tornado'] else [])
- | 'monthly count' >> beam.CombinePerKey(sum)
- | 'format' >> beam.Map(lambda (k, v): {'month': k, 'tornado_count': v})
- | 'save' >> beam.Write(
-    beam.io.BigQuerySink(
-        output_table,
-        schema='month:INTEGER, tornado_count:INTEGER',
-        create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
-        write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)))
-p.run()
-```
-
-This pipeline, like the one above, calculates the number of tornadoes per month, but it uses
a query to filter out the input instead of using the whole table.
-
-```python
-import apache_beam as beam
-project = 'DESTINATION-PROJECT-ID'
-output_table = 'DESTINATION-DATASET.DESTINATION-TABLE'
-input_query = 'SELECT month, COUNT(month) AS tornado_count ' \
-        'FROM [clouddataflow-readonly:samples.weather_stations] ' \
-        'WHERE tornado=true GROUP BY month'
-p = beam.Pipeline(argv=['--project', project])
-(p
- | 'read' >> beam.Read(beam.io.BigQuerySource(query=input_query))
- | 'save' >> beam.Write(beam.io.BigQuerySink(
-    output_table,
-    schema='month:INTEGER, tornado_count:INTEGER',
-    create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
-    write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)))
-p.run()
-```
-
-### Combiner Examples
-
-Combiner transforms use "reducing" functions, such as sum, min, or max, to combine multiple
values of a `PCollection` into a single value.
-
-```python
-import apache_beam as beam
-p = beam.Pipeline('DirectRunner')
-
-SAMPLE_DATA = [('a', 1), ('b', 10), ('a', 2), ('a', 3), ('b', 20)]
-
-(p
- | beam.Create(SAMPLE_DATA)
- | beam.CombinePerKey(sum)
- | beam.io.WriteToText('./sums'))
-p.run()
-```
-
-The [combiners_test.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/cookbook/combiners_test.py)
file contains more combiner examples.
-
-## Organizing Your Code
-
-Many projects will grow to multiple source code files. It is recommended that you organize
your project so that all code involved in running your pipeline can be built as a Python package.
This way, the package can easily be installed in the VM workers executing the job.
-
-Follow the [Juliaset example](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete/juliaset).
If the code is organized in this fashion, you can use the `--setup_file` command line option
to create a source distribution out of the project files, stage the resulting tarball, and
later install it in the workers executing the job.
-
-## More Information
-
-Please report any issues on [JIRA](https://issues.apache.org/jira/browse/BEAM/component/12328910).
-
-If you’re interested in contributing to the Beam SDK, start by reading the [Contribute](http://beam.apache.org/contribute/)
guide.

http://git-wip-us.apache.org/repos/asf/beam/blob/672a2c40/sdks/python/apache_beam/examples/complete/game/README.md
----------------------------------------------------------------------
diff --git a/sdks/python/apache_beam/examples/complete/game/README.md b/sdks/python/apache_beam/examples/complete/game/README.md
deleted file mode 100644
index 39677e4..0000000
--- a/sdks/python/apache_beam/examples/complete/game/README.md
+++ /dev/null
@@ -1,69 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
--->
-# 'Gaming' examples
-
-This directory holds a series of example Dataflow pipelines in a simple 'mobile
-gaming' domain. Each pipeline successively introduces new concepts.
-
-In the gaming scenario, many users play, as members of different teams, over
-the course of a day, and their actions are logged for processing. Some of the
-logged game events may be late-arriving, if users play on mobile devices and go
-transiently offline for a period.
-
-The scenario includes not only "regular" users, but "robot users", which have a
-higher click rate than the regular users, and may move from team to team.
-
-The first two pipelines in the series use pre-generated batch data samples.
-
-All of these pipelines write their results to Google BigQuery table(s).
-
-## The pipelines in the 'gaming' series
-
-### user_score
-
-The first pipeline in the series is `user_score`. This pipeline does batch
-processing of data collected from gaming events. It calculates the sum of
-scores per user, over an entire batch of gaming data (collected, say, for each
-day). The batch processing will not include any late data that arrives after
-the day's cutoff point.
-
-### hourly_team_score
-
-The next pipeline in the series is `hourly_team_score`. This pipeline also
-processes data collected from gaming events in batch. It builds on `user_score`,
-but uses [fixed windows](https://beam.apache.org/documentation/programming-guide/#windowing),
-by default an hour in duration. It calculates the sum of scores per team, for
-each window, optionally allowing specification of two timestamps before and
-after which data is filtered out. This allows a model where late data collected
-after the intended analysis window can be included in the analysis, and any
-late-arriving data prior to the beginning of the analysis window can be removed
-as well.
-
-By using windowing and adding element timestamps, we can do finer-grained
-analysis than with the `UserScore` pipeline — we're now tracking scores for
-each hour rather than over the course of a whole day. However, our batch
-processing is high-latency, in that we don't get results from plays at the
-beginning of the batch's time period until the complete batch is processed.
-
-## Viewing the results in BigQuery
-
-All of the pipelines write their results to BigQuery. `user_score` and
-`hourly_team_score` each write one table. The pipelines have default table names
-that you can override when you start up the pipeline if those tables already
-exist.


Mime
View raw message