beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [beam-site] 01/02: Add page for the portability framework
Date Mon, 06 Nov 2017 22:43:36 GMT
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository

commit 89a1cb330c13a1871f8b58bb08236d19e67c4554
Author: Henning Rohde <>
AuthorDate: Wed Nov 1 10:02:45 2017 -0700

    Add page for the portability framework
 src/contribute/ | 166 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 166 insertions(+)

diff --git a/src/contribute/ b/src/contribute/
new file mode 100644
index 0000000..932a0ac
--- /dev/null
+++ b/src/contribute/
@@ -0,0 +1,166 @@
+layout: section
+title: "Portability Framework"
+permalink: /contribute/portability/
+section_menu: section-menu/contribute.html
+# Portability Framework
+* TOC
+## Overview
+Interoperability between SDKs and runners is a key aspect of Apache
+Beam. So far, however, the reality is that most runners support the
+Java SDK only, because each SDK-runner combination requires non-trivial
+work on both sides. All runners are also currently written in Java,
+which makes support of non-Java SDKs far more expensive. The
+_portability framework_ aims to rectify this situation and provide
+full interoperability across the Beam ecosystem.
+The portability framework introduces well-defined, language-neutral
+data structures and protocols between the SDK and runner. This interop
+layer -- called the _portability API_ -- ensures that SDKs and runners
+can work with each other uniformly, reducing the interoperability
+burden for both SDKs and runners to a constant effort.  It notably
+ensures that _new_ SDKs automatically work with existing runners and
+vice versa.  The framework introduces a new runner, the _Universal
+Local Runner (ULR)_, as a practical reference implementation that
+complements the direct runners. Finally, it enables cross-language
+pipelines (sharing I/O or transformations across SDKs) and
+user-customized execution environments ("custom containers").
+The portability API consists of a set of smaller contracts that
+isolate SDKs and runners for job submission, management and
+execution. These contracts use protobufs and gRPC for broad language
+ * **Job submission and management**: The _Runner API_ defines a
+   language-neutral pipeline representation with transformations
+   specifying the execution environment as a docker container
+   image. The latter both allows the execution side to set up the
+   right environment as well as opens the door for custom containers
+   and cross-environment pipelines. The _Job API_ allows pipeline
+   execution and configuration to be managed uniformly.
+ * **Job execution**: The _SDK harness_ is a SDK-provided
+   program responsible for executing user code and is run separately
+   from the runner.  The _Fn API_ defines an execution-time binary
+   contract between the SDK harness and the runner that describes how
+   execution tasks are managed and how data is transferred. In
+   addition, the runner needs to handle progress and monitoring in an
+   efficient and language-neutral way. SDK harness initialization
+   relies on the _Provision_ and _Artifact APIs_ for obtaining staged
+   files, pipeline options and environment information. Docker
+   provides isolation between the runner and SDK/user environments to
+   the benefit of both as defined by the _container contract_. The
+   containerization of the SDK gives it (and the user, unless the SDK
+   is closed) full control over its own environment without risk of
+   dependency conflicts. The runner has significant freedom regarding
+   how it manages the SDK harness containers.
+The goal is that all (non-direct) runners and SDKs eventually support
+the portability API, perhaps exclusively.
+## Design
+The [model protos](
+contain all aspects of the portability API and is the truth on the
+ground. The proto definitions supercede any design documents. The main
+design documents are the following:
+ * [Runner API]( Pipeline
+   representation and discussion on primitive/composite transforms and
+   optimizations.
+ * [Job API]( Job submission and
+   management protocol.
+ * [Fn API]( Execution-side control
+   and data protocols and overview.
+ * [Container
+   contract](
+   Execution-side docker container invocation and provisioning
+   protocols. See
+   [](
+   for how to build container images.
+In discussion:
+ * [Cross
+   language]( Options
+   and tradeoffs for how to handle various kinds of
+   multi-language/multi-SDK pipelines.
+## Development
+The portability framework is a substantial effort that touches every
+Beam component. In addition to the sheer magnitude, a major challenge
+is engineering an interop layer that does not significantly compromise
+performance due to the additional serialization overhead of a
+language-neutral protocol.
+### Roadmap
+The proposed project phases are roughly as follows and are not
+strictly sequential, as various components will likely move at
+different speeds. Additionally, there have been (and continues to be)
+supporting refactorings that are not always tracked as part of the
+portability effort. Work already done is not tracked here either.
+ * **P1 [MVP]**: Implement the fundamental plumbing for portable SDKs
+   and runners for batch and streaming, including containers and the
+   ULR
+   [[BEAM-2899](]. Each
+   SDK and runner should use the portability framework at least to the
+   extent that wordcount
+   [[BEAM-2896](] and
+   windowed wordcount
+   [[BEAM-2941](] run
+   portably.
+ * **P2 [Feature complete]**: Design and implement portability support
+   for remaining execution-side features, so that any pipeline from
+   any SDK can run portably on any runner. These features include side
+   inputs
+   [[BEAM-2863](], User
+   timers
+   [[BEAM-2925](],
+   Splittable DoFn
+   [[BEAM-2896](] and
+   more.  Each SDK and runner should use the portability framework at
+   least to the extent that the mobile gaming examples
+   [[BEAM-2940](] run
+   portably.
+ * **P3 [Performance]**: Measure and tune performance of portable
+   pipelines using benchmarks such as Nexmark. Features such as
+   progress reporting
+   [[BEAM-2940](],
+   combiner lifting
+   [[BEAM-2937](] and
+   fusion are expected to be needed.
+ * **P4 [Cross language]**: Design and implement cross-language
+   pipeline support, including how the ecosystem of shared transforms
+   should work.
+### Issues
+The portability effort touches every component, so the "portability"
+label is used to identify all portability-related issues. Pure
+design or proto definitions should use the "beam-model" component. A
+common pattern for new portability features is that the overall
+feature is in "beam-model" with subtasks for each SDK and runner in
+their respective components.
+**JIRA:** [query](
+### Status
+MVP in progress. No SDK or runner supports the full portability API
+yet, but once that happens a more detailed progress table will be
+added here.

To stop receiving notification emails like this one, please contact
"" <>.

View raw message