beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [10/50] [abbrv] incubator-beam git commit: [readme] update to reflect the current state
Date Fri, 04 Mar 2016 18:11:05 GMT
[readme] update to reflect the current state


Branch: refs/heads/master
Commit: 70ae13c7497907cd7ba81481dc7eafff1615adfb
Parents: 8434c3c
Author: Max <>
Authored: Thu Feb 11 12:36:02 2016 +0100
Committer: Davor Bonaci <>
Committed: Fri Mar 4 10:04:23 2016 -0800

 runners/flink/ | 82 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 67 insertions(+), 15 deletions(-)
diff --git a/runners/flink/ b/runners/flink/
index 54d248c..499ed6d 100644
--- a/runners/flink/
+++ b/runners/flink/
@@ -1,13 +1,72 @@
-Flink-Dataflow is a Google Dataflow Runner for Apache Flink. It enables you to
-run Dataflow programs with Flink as an execution engine.
+Flink-Dataflow is a Runner for Google Dataflow (aka Apache Beam) which enables you to
+run Dataflow programs with Flink. It integrates seamlessly with the Dataflow
+API, allowing you to execute Dataflow programs in streaming or batch mode.
+## Streaming
+### Full Dataflow Windowing and Triggering Semantics
+The Flink Dataflow Runner supports *Event Time* allowing you to analyze data with respect
to its
+associated timestamp. It handles out-or-order and late-arriving elements. You may leverage
the full
+power of the Dataflow windowing semantics like *time-based*, *sliding*, *tumbling*, or *count*
+windows. You may build *session* windows which allow you to keep track of events associated
+each other.
+### Fault-Tolerance
+The program's state is persisted by Apache Flink. You may re-run and resume your program
+failure or if you decide to continue computation at a later time.
+### Sources and Sinks
+Build your own data ingestion or digestion using the source/sink interface. Re-use Flink's
+and sinks or use the provided support for Apache Kafka.
+### Seamless integration
+To execute a Dataflow program in streaming mode, just enable streaming in the `PipelineOptions`:
+    options.setStreaming(true);
+That's it. If you prefer batched execution, simply disable streaming mode.
+## Batch
+### Batch optimization
+Flink gives you out-of-core algorithms which operate on its managed memory to perform sorting,

+caching, and hash table operations. We have optimized operations like CoGroup to use Flink's
+optimized out-of-core implementation.
+### Fault-Tolerance
+We guarantee job-level fault-tolerance which gracefully restarts failed batch jobs.
+### Sources and Sinks
+Build your own data ingestion or digestion using the source/sink interface or re-use Flink's
+and sinks.
+## Features
+The Flink Dataflow Runner maintains as much compatibility with the Dataflow API as possible.
+support transformations on data like:
+- Grouping
+- Windowing
+- ParDo
+- CoGroup
+- Flatten
+- Combine
+- Side inputs/outputs
+- Encoding
 # Getting Started
-To get started using Google Dataflow on top of Apache Flink, we need to install the
-latest version of Flink-Dataflow.
+To get started using Flink-Dataflow, we first need to install the latest version.
 ## Install Flink-Dataflow ##
@@ -46,7 +105,6 @@ p.apply(TextIO.Read.named("ReadLines").from(options.getInput()));
 To execute the example, let's first get some sample data:
     curl > kinglear.txt
@@ -58,7 +116,7 @@ Then let's run the included WordCount locally on your machine:
 Congratulations, you have run your first Google Dataflow program on top of Apache Flink!
-# Running Dataflow on Flink on a cluster
+# Running Dataflow programs on a Flink cluster
 You can run your Dataflow program on an Apache Flink cluster. Please start off by creating
a new
 Maven project.
@@ -137,14 +195,8 @@ folder to the Flink cluster using the command-line utility like so:
     ./bin/flink run /path/to/fat.jar
-For more information, please visit the [Apache Flink Website]( or
-the [Mailinglists](
-# Streaming
-Streaming support has been added. It is currently in alpha stage. Please give it a try. To
-streaming, just enable streaming mode in the `PipelineOptions`:
+# More
-    options.setStreaming(true);
-That's all.
\ No newline at end of file
+For more information, please visit the [Apache Flink Website]( or
+the [Mailinglists](
\ No newline at end of file

View raw message