beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From da...@apache.org
Subject [1/3] beam-site git commit: Update documentation to remove python-sdk branch references
Date Tue, 31 Jan 2017 07:09:21 GMT
Repository: beam-site
Updated Branches:
  refs/heads/asf-site b81afa390 -> 689c36863


Update documentation to remove python-sdk branch references


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f9eb9fc3
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f9eb9fc3
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f9eb9fc3

Branch: refs/heads/asf-site
Commit: f9eb9fc3cc6ac62fd8989f1addf7612213e8dbe6
Parents: b81afa3
Author: Ahmet Altay <altay@google.com>
Authored: Mon Jan 30 19:22:50 2017 -0800
Committer: Davor Bonaci <davor@google.com>
Committed: Mon Jan 30 23:06:36 2017 -0800

----------------------------------------------------------------------
 .../2016-10-12-strata-hadoop-world-and-beam.md  |  2 +-
 src/contribute/work-in-progress.md              |  1 -
 src/documentation/programming-guide.md          | 38 ++++++++++----------
 src/documentation/runners/dataflow.md           |  2 +-
 src/documentation/runners/direct.md             |  4 +--
 src/documentation/runners/flink.md              |  2 +-
 src/get-started/quickstart-py.md                |  4 +--
 src/get-started/wordcount-example.md            | 24 ++++++-------
 8 files changed, 38 insertions(+), 39 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/f9eb9fc3/src/_posts/2016-10-12-strata-hadoop-world-and-beam.md
----------------------------------------------------------------------
diff --git a/src/_posts/2016-10-12-strata-hadoop-world-and-beam.md b/src/_posts/2016-10-12-strata-hadoop-world-and-beam.md
index b78fa4a..4cc9fb4 100644
--- a/src/_posts/2016-10-12-strata-hadoop-world-and-beam.md
+++ b/src/_posts/2016-10-12-strata-hadoop-world-and-beam.md
@@ -18,7 +18,7 @@ I want to share some of takeaways I had about Beam during the conference.
 
 The Data Engineers are looking to Beam as a way to [future-proof](https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code),
meaning that code is portable between the various Big Data frameworks. In fact, many of the
attendees were still on Hadoop MapReduce and looking to transition to a new framework. They’re
realizing that continually rewriting code isn’t the most productive approach.
 
-Data Scientists are really interested in using Beam. They interested in having a single API
for doing analysis instead of several different APIs. We talked about Beam’s progress on
the Python API. If you want to take a peek, it’s being actively developed on a [feature
branch](https://github.com/apache/beam/tree/python-sdk). As Beam matures, we’re looking
to add other supported languages.
+Data Scientists are really interested in using Beam. They interested in having a single API
for doing analysis instead of several different APIs. We talked about Beam’s progress on
the Python API. If you want to take a peek, it’s being actively developed on a [feature
branch](https://github.com/apache/beam/tree/master/sdks/python). As Beam matures, we’re
looking to add other supported languages.
 
 We heard [loud and clear](https://twitter.com/jessetanderson/status/781124173108305920) from
Beam users that great runner support is crucial to adoption. We have great Apache Flink support.
During the conference we had some more volunteers offer their help on the Spark runner.
 

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f9eb9fc3/src/contribute/work-in-progress.md
----------------------------------------------------------------------
diff --git a/src/contribute/work-in-progress.md b/src/contribute/work-in-progress.md
index 258f87c..c3a4d17 100644
--- a/src/contribute/work-in-progress.md
+++ b/src/contribute/work-in-progress.md
@@ -25,7 +25,6 @@ Current branches include:
 | Feature | Branch | JIRA Component | More Info |
 | ---- | ---- | ---- | ---- |
 | Apache Gearpump Runner | [gearpump-runner](https://github.com/apache/beam/tree/gearpump-runner)
| [runner-gearpump](https://issues.apache.org/jira/browse/BEAM/component/12330829) | [README](https://github.com/apache/beam/blob/gearpump-runner/runners/gearpump/README.md)
|
-| Python SDK | [python-sdk](https://github.com/apache/beam/tree/python-sdk) | [sdk-py](https://issues.apache.org/jira/browse/BEAM/component/12328910)
| [README](https://github.com/apache/beam/blob/python-sdk/sdks/python/README.md) |
 | Apache Spark 2.0 Runner | [runners-spark2](https://github.com/apache/beam/tree/runners-spark2)
| - | [thread](https://lists.apache.org/thread.html/e38ac4e4914a6cb1b865b1f32a6ca06c2be28ea4aa0f6b18393de66f@%3Cdev.beam.apache.org%3E)
|
 {:.table}
 

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f9eb9fc3/src/documentation/programming-guide.md
----------------------------------------------------------------------
diff --git a/src/documentation/programming-guide.md b/src/documentation/programming-guide.md
index 71ee487..9846929 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -71,13 +71,13 @@ When you run your Beam driver program, the Pipeline Runner that you designate
co
 
 ## <a name="pipeline"></a>Creating the pipeline
 
-The `Pipeline` abstraction encapsulates all the data and steps in your data processing task.
Your Beam driver program typically starts by constructing a <span class="language-java">[Pipeline]({{
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/Pipeline.html)</span><span
class="language-py">[Pipeline](https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/pipeline.py)</span>
object, and then using that object as the basis for creating the pipeline's data sets as `PCollection`s
and its operations as `Transform`s.
+The `Pipeline` abstraction encapsulates all the data and steps in your data processing task.
Your Beam driver program typically starts by constructing a <span class="language-java">[Pipeline]({{
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/Pipeline.html)</span><span
class="language-py">[Pipeline](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py)</span>
object, and then using that object as the basis for creating the pipeline's data sets as `PCollection`s
and its operations as `Transform`s.
 
 To use Beam, your driver program must first create an instance of the Beam SDK class `Pipeline`
(typically in the `main()` function). When you create your `Pipeline`, you'll also need to
set some **configuration options**. You can set your pipeline's configuration options programatically,
but it's often easier to set the options ahead of time (or read them from the command line)
and pass them to the `Pipeline` object when you create the object.
 
 The pipeline configuration options determine, among other things, the `PipelineRunner` that
determines where the pipeline gets executed: locally, or using a distributed back-end of your
choice. Depending on where your pipeline gets executed and what your specifed Runner requires,
the options can also help you specify other aspects of execution.
 
-To set your pipeline's configuration options and create the pipeline, create an object of
type <span class="language-java">[PipelineOptions]({{ site.baseurl }}/documentation/sdks/javadoc/{{
site.release_latest }}/index.html?org/apache/beam/sdk/options/PipelineOptions.html)</span><span
class="language-py">[PipelineOptions](https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/utils/pipeline_options.py)</span>
and pass it to `Pipeline.Create()`. The most common way to do this is by parsing arguments
from the command-line:
+To set your pipeline's configuration options and create the pipeline, create an object of
type <span class="language-java">[PipelineOptions]({{ site.baseurl }}/documentation/sdks/javadoc/{{
site.release_latest }}/index.html?org/apache/beam/sdk/options/PipelineOptions.html)</span><span
class="language-py">[PipelineOptions](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py)</span>
and pass it to `Pipeline.Create()`. The most common way to do this is by parsing arguments
from the command-line:
 
 ```java
 public static void main(String[] args) {
@@ -333,7 +333,7 @@ class ComputeWordLengthFn(beam.DoFn):
     # Use return to emit the output element.
     return [len(word)]
 
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_apply
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_apply
 %}```
 
 In the example, our input `PCollection` contains `String` values. We apply a `ParDo` transform
that specifies a function (`ComputeWordLengthFn`) to compute the length of each string, and
outputs the result to a new `PCollection` of `Integer` values that stores the length of each
word.
@@ -418,7 +418,7 @@ words = ...
 
 # Apply a lambda function to the PCollection words.
 # Save the result as the PCollection word_lengths.
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_using_flatmap
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_using_flatmap
 %}```
 
 If your `ParDo` performs a one-to-one mapping of input elements to output elements--that
is, for each input element, it applies a function that produces *exactly one* output element,
you can use the higher-level <span class="language-java">`MapElements`</span><span
class="language-py">`Map`</span> transform. <span class="language-java">`MapElements`
can accept an anonymous Java 8 lambda function for additional brevity.</span>
@@ -442,7 +442,7 @@ words = ...
 
 # Apply a Map with a lambda function to the PCollection words.
 # Save the result as the PCollection word_lengths.
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_using_map
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_using_map
 %}```
 
 {:.language-java}
@@ -490,7 +490,7 @@ Thus, `GroupByKey` represents a transform from a multimap (multiple keys
to indi
 
 #### <a name="transforms-combine"></a>Using Combine
 
-<span class="language-java">[`Combine`]({{ site.baseurl }}/documentation/sdks/javadoc/{{
site.release_latest }}/index.html?org/apache/beam/sdk/transforms/Combine.html)</span><span
class="language-py">[`Combine`](https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/transforms/core.py)</span>
is a Beam transform for combining collections of elements or values in your data. `Combine`
has variants that work on entire `PCollection`s, and some that combine the values for each
key in `PCollection`s of key/value pairs.
+<span class="language-java">[`Combine`]({{ site.baseurl }}/documentation/sdks/javadoc/{{
site.release_latest }}/index.html?org/apache/beam/sdk/transforms/Combine.html)</span><span
class="language-py">[`Combine`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py)</span>
is a Beam transform for combining collections of elements or values in your data. `Combine`
has variants that work on entire `PCollection`s, and some that combine the values for each
key in `PCollection`s of key/value pairs.
 
 When you apply a `Combine` transform, you must provide the function that contains the logic
for combining the elements or values. The combining function should be commutative and associative,
as the function is not necessarily invoked exactly once on all values with a given key. Because
the input data (including the value collection) may be distributed across multiple workers,
the combining function might be called multiple times to perform partial combining on subsets
of the value collection. The Beam SDK also provides some pre-built combine functions for common
numeric combination operations such as sum, min, and max.
 
@@ -515,7 +515,7 @@ public static class SumInts implements SerializableFunction<Iterable<Integer>,
I
 ```
 
 ```py
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:combine_bounded_sum
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:combine_bounded_sum
 %}```
 
 ##### **Advanced combinations using CombineFn**
@@ -570,7 +570,7 @@ public class AverageFn extends CombineFn<Integer, AverageFn.Accum,
Double> {
 
 ```py
 pc = ...
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:combine_custom_average
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:combine_custom_average
 %}```
 
 If you are combining a `PCollection` of key-value pairs, [per-key combining](#transforms-combine-per-key)
is often enough. If you need the combining strategy to change based on the key (for example,
MIN for some users and MAX for other users), you can define a `KeyedCombineFn` to access the
key within the combining strategy.
@@ -661,7 +661,7 @@ avg_accuracy_per_player = (player_accuracies
 
 #### <a name="transforms-flatten-partition"></a>Using Flatten and Partition
 
-<span class="language-java">[`Flatten`]({{ site.baseurl }}/documentation/sdks/javadoc/{{
site.release_latest }}/index.html?org/apache/beam/sdk/transforms/Flatten.html)</span><span
class="language-py">[`Flatten`](https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/transforms/core.py)</span>
and <span class="language-java">[`Partition`]({{ site.baseurl }}/documentation/sdks/javadoc/{{
site.release_latest }}/index.html?org/apache/beam/sdk/transforms/Partition.html)</span><span
class="language-py">[`Partition`](https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/transforms/core.py)</span>
are Beam transforms for `PCollection` objects that store the same data type. `Flatten` merges
multiple `PCollection` objects into a single logical `PCollection`, and `Partition` splits
a single `PCollection` into a fixed number of smaller collections.
+<span class="language-java">[`Flatten`]({{ site.baseurl }}/documentation/sdks/javadoc/{{
site.release_latest }}/index.html?org/apache/beam/sdk/transforms/Flatten.html)</span><span
class="language-py">[`Flatten`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py)</span>
and <span class="language-java">[`Partition`]({{ site.baseurl }}/documentation/sdks/javadoc/{{
site.release_latest }}/index.html?org/apache/beam/sdk/transforms/Partition.html)</span><span
class="language-py">[`Partition`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py)</span>
are Beam transforms for `PCollection` objects that store the same data type. `Flatten` merges
multiple `PCollection` objects into a single logical `PCollection`, and `Partition` splits
a single `PCollection` into a fixed number of smaller collections.
 
 ##### **Flatten**
 
@@ -811,13 +811,13 @@ Side inputs are useful if your `ParDo` needs to inject additional data
when proc
 # For example, using pvalue.AsIter(pcoll) at pipeline construction time results in an iterable
of the actual elements of pcoll being passed into each process invocation.
 # In this example, side inputs are passed to a FlatMap transform as extra arguments and consumed
by filter_using_length.
 
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_side_input
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_side_input
 %}
 
 # We can also pass side inputs to a ParDo transform, which will get passed to its process
method.
 # The only change is that the first arguments are self and a context, rather than the PCollection
element itself.
 
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_side_input_dofn
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_side_input_dofn
 %}
 ...
 
@@ -893,12 +893,12 @@ While `ParDo` always produces a main output `PCollection` (as the return
value f
 # with_outputs() returns a DoOutputsTuple object. Tags specified in with_outputs are attributes
on the returned DoOutputsTuple object.
 # The tags give access to the corresponding output PCollections.
 
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_with_side_outputs
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_with_side_outputs
 %}
 
 # The result is also iterable, ordered in the same order that the tags were passed to with_outputs(),
the main tag (if specified) first.
 
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_with_side_outputs_iter
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_with_side_outputs_iter
 %}```
 
 ##### Emitting to side outputs in your DoFn:
@@ -932,13 +932,13 @@ While `ParDo` always produces a main output `PCollection` (as the return
value f
 # using the pvalue.SideOutputValue wrapper class.
 # Based on the previous example, this shows the DoFn emitting to the main and side outputs.
 
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_emitting_values_on_side_outputs
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_emitting_values_on_side_outputs
 %}
 
 # Side outputs are also available in Map and FlatMap.
 # Here is an example that uses FlatMap and shows that the tags do not need to be specified
ahead of time.
 
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_with_side_outputs_undeclared
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py
tag:model_pardo_with_side_outputs_undeclared
 %}```
 
 ## <a name="io"></a>Pipeline I/O
@@ -1047,14 +1047,14 @@ See the language specific source code directories for the Beam supported
I/O API
 <tr>
   <td>Python</td>
   <td>
-    <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py">avroio</a></p>
-    <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/textio.py">textio</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py">avroio</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py">textio</a></p>
   </td>
   <td>
   </td>
   <td>
-    <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/bigquery.py">Google
BigQuery</a></p>
-    <p><a href="https://github.com/apache/beam/tree/python-sdk/sdks/python/apache_beam/io/datastore">Google
Cloud Datastore</a></p>
+    <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/bigquery.py">Google
BigQuery</a></p>
+    <p><a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/datastore">Google
Cloud Datastore</a></p>
   </td>
 
 </tr>

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f9eb9fc3/src/documentation/runners/dataflow.md
----------------------------------------------------------------------
diff --git a/src/documentation/runners/dataflow.md b/src/documentation/runners/dataflow.md
index f707d47..f2037a2 100644
--- a/src/documentation/runners/dataflow.md
+++ b/src/documentation/runners/dataflow.md
@@ -101,7 +101,7 @@ When executing your pipeline with the Cloud Dataflow Runner, set these
pipeline
 </tr>
 </table>
 
-See the reference documentation for the  <span class="language-java">[DataflowPipelineOptions]({{
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.html)</span><span
class="language-python">[PipelineOptions](https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/utils/pipeline_options.py)</span>
interface (and its subinterfaces) for the complete list of pipeline configuration options.
+See the reference documentation for the  <span class="language-java">[DataflowPipelineOptions]({{
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.html)</span><span
class="language-python">[PipelineOptions](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py)</span>
interface (and its subinterfaces) for the complete list of pipeline configuration options.
 
 ## Additional information and caveats
 

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f9eb9fc3/src/documentation/runners/direct.md
----------------------------------------------------------------------
diff --git a/src/documentation/runners/direct.md b/src/documentation/runners/direct.md
index c96e7b8..babe4cb 100644
--- a/src/documentation/runners/direct.md
+++ b/src/documentation/runners/direct.md
@@ -37,9 +37,9 @@ You must specify your dependency on the Direct Runner.
 
 When executing your pipeline from the command-line, set `runner` to `direct`. The default
values for the other pipeline options are generally sufficient.
 
-See the reference documentation for the  <span class="language-java">[`DirectOptions`]({{
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/runners/direct/DirectOptions.html)</span><span
class="language-python">[`PipelineOptions`](https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/utils/pipeline_options.py)</span>
interface (and its subinterfaces) for defaults and the complete list of pipeline configuration
options.
+See the reference documentation for the  <span class="language-java">[`DirectOptions`]({{
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/runners/direct/DirectOptions.html)</span><span
class="language-python">[`PipelineOptions`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py)</span>
interface (and its subinterfaces) for defaults and the complete list of pipeline configuration
options.
 
 ## Additional information and caveats
 
-Local execution is limited by the memory available in your local environment. It is highly
recommended that you run your pipeline with data sets small enough to fit in local memory.
You can create a small in-memory data set using a <span class="language-java">[`Create`]({{
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/transforms/Create.html)</span><span
class="language-python">[`Create`](https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/transforms/core.py)</span>
transform, or you can use a <span class="language-java">[`Read`]({{ site.baseurl }}/documentation/sdks/javadoc/{{
site.release_latest }}/index.html?org/apache/beam/sdk/io/Read.html)</span><span class="language-python">[`Read`](https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py)</span>
transform to work with small local or remote files.
+Local execution is limited by the memory available in your local environment. It is highly
recommended that you run your pipeline with data sets small enough to fit in local memory.
You can create a small in-memory data set using a <span class="language-java">[`Create`]({{
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/sdk/transforms/Create.html)</span><span
class="language-python">[`Create`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py)</span>
transform, or you can use a <span class="language-java">[`Read`]({{ site.baseurl }}/documentation/sdks/javadoc/{{
site.release_latest }}/index.html?org/apache/beam/sdk/io/Read.html)</span><span class="language-python">[`Read`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py)</span>
transform to work with small local or remote files.
 

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f9eb9fc3/src/documentation/runners/flink.md
----------------------------------------------------------------------
diff --git a/src/documentation/runners/flink.md b/src/documentation/runners/flink.md
index f2e59b5..ed52689 100644
--- a/src/documentation/runners/flink.md
+++ b/src/documentation/runners/flink.md
@@ -129,7 +129,7 @@ When executing your pipeline with the Flink Runner, you can set these
pipeline o
 </tr>
 </table>
 
-See the reference documentation for the  <span class="language-java">[FlinkPipelineOptions]({{
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/runners/flink/FlinkPipelineOptions.html)</span><span
class="language-python">[PipelineOptions](https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/utils/pipeline_options.py)</span>
interface (and its subinterfaces) for the complete list of pipeline configuration options.
+See the reference documentation for the  <span class="language-java">[FlinkPipelineOptions]({{
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest }}/index.html?org/apache/beam/runners/flink/FlinkPipelineOptions.html)</span><span
class="language-python">[PipelineOptions](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py)</span>
interface (and its subinterfaces) for the complete list of pipeline configuration options.
 
 ## Additional information and caveats
 

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f9eb9fc3/src/get-started/quickstart-py.md
----------------------------------------------------------------------
diff --git a/src/get-started/quickstart-py.md b/src/get-started/quickstart-py.md
index a198eba..57bdefc 100644
--- a/src/get-started/quickstart-py.md
+++ b/src/get-started/quickstart-py.md
@@ -63,7 +63,7 @@ For instructions using other shells, see the [virtualenv documentation](https://
 ### Download and install
 
 1. Clone the Apache Beam repo from GitHub: 
-  `git clone https://github.com/apache/beam.git --branch python-sdk`
+  `git clone https://github.com/apache/beam.git`
 
 2. Navigate to the `python` directory: 
   `cd beam/sdks/python/`
@@ -79,7 +79,7 @@ For instructions using other shells, see the [virtualenv documentation](https://
 
 ## Execute a pipeline locally
 
-The Apache Beam [examples](https://github.com/apache/beam/tree/python-sdk/sdks/python/apache_beam/examples)
directory has many examples. All examples can be run locally by passing the required arguments
described in the example script.
+The Apache Beam [examples](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples)
directory has many examples. All examples can be run locally by passing the required arguments
described in the example script.
 
 For example, to run `wordcount.py`, run:
 

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f9eb9fc3/src/get-started/wordcount-example.md
----------------------------------------------------------------------
diff --git a/src/get-started/wordcount-example.md b/src/get-started/wordcount-example.md
index bf484b2..b6e1985 100644
--- a/src/get-started/wordcount-example.md
+++ b/src/get-started/wordcount-example.md
@@ -69,7 +69,7 @@ You can specify a runner for executing your pipeline, such as the `DataflowRunne
 ```
 
 ```py
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_options
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_options
 %}```
 
 The next step is to create a Pipeline object with the options we've just constructed. The
Pipeline object builds up the graph of transformations to be executed, associated with that
particular pipeline.
@@ -79,7 +79,7 @@ Pipeline p = Pipeline.create(options);
 ```
 
 ```py
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_create
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_create
 %}```
 
 ### Applying Pipeline Transforms
@@ -100,7 +100,7 @@ The Minimal WordCount pipeline contains five transforms:
     ```
 
     ```py
-    {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_read
+    {% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_read
     %}```
 
 2.  A [ParDo]({{ site.baseurl }}/documentation/programming-guide/#transforms-pardo) transform
that invokes a `DoFn` (defined in-line as an anonymous class) on each element that tokenizes
the text lines into individual words. The input for this transform is the `PCollection` of
text lines generated by the previous `TextIO.Read` transform. The `ParDo` transform outputs
a new `PCollection`, where each element represents an individual word in the text.
@@ -120,7 +120,7 @@ The Minimal WordCount pipeline contains five transforms:
 
     ```py
     # The Flatmap transform is a simplified version of ParDo.
-    {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_pardo
+    {% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_pardo
     %}```
 
 3.  The SDK-provided `Count` transform is a generic transform that takes a `PCollection`
of any type, and returns a `PCollection` of key/value pairs. Each key represents a unique
element from the input collection, and each value represents the number of times that key
appeared in the input collection.
@@ -132,7 +132,7 @@ The Minimal WordCount pipeline contains five transforms:
     ```
 
     ```py
-    {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_count
+    {% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_count
     %}```
 
 4.  The next transform formats each of the key/value pairs of unique words and occurrence
counts into a printable string suitable for writing to an output file.
@@ -149,7 +149,7 @@ The Minimal WordCount pipeline contains five transforms:
     ```
 
     ```py
-    {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_map
+    {% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_map
     %}```
 
 5.  A text file write transform. This transform takes the final `PCollection` of formatted
Strings as input and writes each element to an output text file. Each element in the input
`PCollection` represents one line of text in the resulting output file.
@@ -159,7 +159,7 @@ The Minimal WordCount pipeline contains five transforms:
     ```
 
     ```py
-    {% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_write
+    {% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_write
     %}```
     
 Note that the `Write` transform produces a trivial result value of type `PDone`, which in
this case is ignored.
@@ -173,7 +173,7 @@ p.run().waitUntilFinish();
 ```
 
 ```py
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_run
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_minimal_run
 %}```
 
 Note that the `run` method is asynchronous. For a blocking execution instead, run your pipeline
appending the `waitUntilFinish` method.
@@ -214,7 +214,7 @@ static class ExtractWordsFn extends DoFn<String, String> {
 ```py
 # In this example, the DoFns are defined as classes:
 
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_wordcount_dofn
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_wordcount_dofn
 %}```
 
 ### Creating Composite Transforms
@@ -253,7 +253,7 @@ public static void main(String[] args) throws IOException {
 ```
 
 ```py
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_wordcount_composite
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_wordcount_composite
 %}```
 
 ### Using Parameterizable PipelineOptions
@@ -280,7 +280,7 @@ public static void main(String[] args) {
 ```
 
 ```py
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_wordcount_options
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:examples_wordcount_wordcount_options
 %}```
 
 ## Debugging WordCount Example
@@ -330,7 +330,7 @@ public class DebuggingWordCount {
 ```
 
 ```py
-{% github_sample /apache/beam/blob/python-sdk/sdks/python/apache_beam/examples/snippets/snippets.py
tag:example_wordcount_debugging_logging
+{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py
tag:example_wordcount_debugging_logging
 %}```
 
 If you execute your pipeline using `DataflowRunner`, you can control the worker log levels.
Dataflow workers that execute user code are configured to log to Cloud Logging by default
at "INFO" log level and higher. You can override log levels for specific logging namespaces
by specifying: `--workerLogLevelOverrides={"Name1":"Level1","Name2":"Level2",...}`. For example,
by specifying `--workerLogLevelOverrides={"org.apache.beam.examples":"DEBUG"}` when executing
this pipeline using the Dataflow service, Cloud Logging would contain only "DEBUG" or higher
level logs for the package in addition to the default "INFO" or higher level logs. 


Mime
View raw message