beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [1/3] beam-site git commit: Blog post for 0.6.0 release with python sdk
Date Thu, 16 Mar 2017 23:21:57 GMT
Repository: beam-site
Updated Branches:
  refs/heads/asf-site 2ccd76628 -> 3917f6e3d

Blog post for 0.6.0 release with python sdk


Branch: refs/heads/asf-site
Commit: be9e207ddd881cf6beda26b811b20ba878def648
Parents: 2ccd766
Author: Ahmet Altay <>
Authored: Thu Mar 16 15:43:27 2017 -0700
Committer: Davor Bonaci <>
Committed: Thu Mar 16 16:20:42 2017 -0700

 src/_data/authors.yml                       | 18 +++---
 src/_posts/ | 72 ++++++++++++++++++++++++
 2 files changed, 83 insertions(+), 7 deletions(-)
diff --git a/src/_data/authors.yml b/src/_data/authors.yml
index e4aa332..f66a69a 100644
--- a/src/_data/authors.yml
+++ b/src/_data/authors.yml
@@ -2,6 +2,9 @@ aljoscha:
     name: Aljoscha Krettek
     twitter: aljoscha
+    name: Ahmet Altay
+    email:
     name: Davor Bonaci
@@ -18,6 +21,13 @@ jamesmalone:
     name: James Malone
     twitter: chimerasaurus
+    name: Jesse Anderson
+    twitter: jessetanderson
+    name: Kenneth Knowles
+    email:
+    twitter: KennKnowles
     name: Robert Bradshaw
@@ -29,14 +39,8 @@ takidau:
     name: Thomas Groh
-    name: Jesse Anderson
-    twitter: jessetanderson
     name: Thomas Weise
     twitter: thweise
-    name: Kenneth Knowles
-    email:
-    twitter: KennKnowles
diff --git a/src/_posts/ b/src/_posts/
new file mode 100644
index 0000000..72f5209
--- /dev/null
+++ b/src/_posts/
@@ -0,0 +1,72 @@
+layout: post
+title:  "Python SDK released in Apache Beam 0.6.0"
+date:   2017-03-16 00:00:01 -0800
+excerpt_separator: <!--more-->
+categories: blog
+  - altay
+Apache Beam’s latest release, version [0.6.0]({{ site.baseurl }}/get-started/downloads/),
introduces a new SDK -- this time, for the Python programming language. The Python SDK joins
the Java SDK as the second implementation of the Beam programming model.
+The Python SDK incorporates all of the main concepts of the Beam model, including ParDo,
GroupByKey, Windowing, and others. It features extensible IO APIs for writing bounded sources
and sinks, and provides built-in implementation for reading and writing Text, Avro, and TensorFlow
record files, as well as connectors to Google BigQuery and Google Cloud Datastore.
+There are two runners capable of executing pipelines written with the Python SDK today: [Direct
Runner]({{ site.baseurl }}/documentation/runners/direct/) and [Dataflow Runner]({{ site.baseurl
}}/documentation/runners/dataflow/), both of which are currently limited to batch execution
only. Upcoming features will shortly bring the benefits of the Python SDK to additional runners.
+#### Try the Apache Beam Python SDK
+If you would like to try out the Python SDK, a good place to start is the [Quickstart]({{
site.baseurl }}/get-started/quickstart-py/). After that, you can take a look at additional
[examples](, and
deep dive into the [API reference]({{ site.baseurl }}/documentation/sdks/pydoc/).
+Let’s take a look at a quick example together. First, install the `apache-beam` package
from PyPI and start your Python interpreter.
+$ pip install apache-beam
+$ python
+We will harness the power of Apache Beam to estimate Pi in honor of the recently passed Pi
+import random
+import apache_beam as beam
+def run_trials(count):
+  """Throw darts into unit square and count how many fall into unit circle."""
+  inside = 0
+  for _ in xrange(count):
+    x, y = random.uniform(0, 1), random.uniform(0, 1)
+    inside += 1 if x*x + y*y <= 1.0 else 0
+  return count, inside
+def combine_results(results):
+  """Given all the trial results, estimate pi."""
+  total, inside = sum(r[0] for r in results), sum(r[1] for r in results)
+  return total, inside, 4 * float(inside) / total if total > 0 else 0
+p = beam.Pipeline()
+(p | beam.Create([500] * 10)  # Create 10 experiments with 500 samples each.
+   | beam.Map(run_trials)     # Run experiments in parallel.
+   | beam.CombineGlobally(combine_results)      # Combine the results.
+   |'./pi_estimate.txt'))  # Write PI estimate to a file.
+This example estimates Pi by throwing random darts into the unit square and keeping track
of the fraction of those darts that fell into the unit circle (see the full [example](
for details). If you are curious, you can check the result of our estimation by looking at
the output file.
+$ cat pi_estimate.txt*
+#### Roadmap
+The first thing on the Python SDK’s roadmap is to address two of its limitations. First,
the existing runners are currently limited to bounded PCollections, and we are looking forward
to extending the SDK to support unbounded PCollections (“streaming”). Additionally, we
are working on extending support to more Apache Beam runners, and the upcoming Fn API will
do the heavy lifting.
+Both of these improvements will enable the Python SDK to fulfill the mission of Apache Beam:
a unified programming model for batch and streaming data processing that can run on any execution
+#### Join us!
+Please consider joining us, whether as a user or a contributor, as we work towards our first
release with API stability. If you’d like to try out Apache Beam today, check out the latest
[0.6.0]({{ site.baseurl }}/get-started/downloads/) release. We welcome contributions and participation
from anyone through our mailing lists, issue tracker, pull requests, and events.

View raw message