Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8D21D200BB3 for ; Wed, 19 Oct 2016 06:06:39 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8B5CE160AFB; Wed, 19 Oct 2016 04:06:39 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AB3D1160AE5 for ; Wed, 19 Oct 2016 06:06:38 +0200 (CEST) Received: (qmail 86539 invoked by uid 500); 19 Oct 2016 04:06:37 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 86530 invoked by uid 99); 19 Oct 2016 04:06:37 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2016 04:06:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id DCBC1C2582 for ; Wed, 19 Oct 2016 04:06:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.219 X-Spam-Level: X-Spam-Status: No, score=-6.219 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id tVTRYdUl7jXo for ; Wed, 19 Oct 2016 04:06:32 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 8B1B45FC21 for ; Wed, 19 Oct 2016 04:06:31 +0000 (UTC) Received: (qmail 85928 invoked by uid 99); 19 Oct 2016 04:06:30 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2016 04:06:30 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 95702E6998; Wed, 19 Oct 2016 04:06:30 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: frances@apache.org To: commits@beam.incubator.apache.org Date: Wed, 19 Oct 2016 04:06:33 -0000 Message-Id: In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [4/8] incubator-beam-site git commit: Add Design Principles (take from the original Beam technical vision document). archived-at: Wed, 19 Oct 2016 04:06:39 -0000 Add Design Principles (take from the original Beam technical vision document). Project: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/commit/99783418 Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/tree/99783418 Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/diff/99783418 Branch: refs/heads/asf-site Commit: 997834188ecf29b307e195c9c7e8d31fa60b34ff Parents: 7f234a5 Author: Frances Perry Authored: Mon Oct 3 19:00:03 2016 -0700 Committer: Frances Perry Committed: Tue Oct 18 20:56:39 2016 -0700 ---------------------------------------------------------------------- _includes/header.html | 5 ++-- contribute/design-principles.md | 53 ++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/99783418/_includes/header.html ---------------------------------------------------------------------- diff --git a/_includes/header.html b/_includes/header.html index 182b30a..67631a9 100644 --- a/_includes/header.html +++ b/_includes/header.html @@ -63,12 +63,13 @@
  • Contribution Guide
  • -
  • Testing
  • Mailing Lists
  • Source Repository
  • Issue Tracking
  • - +
  • Testing
  • +
  • Design Principles
  • Technical Vision
  • http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/99783418/contribute/design-principles.md ---------------------------------------------------------------------- diff --git a/contribute/design-principles.md b/contribute/design-principles.md new file mode 100644 index 0000000..87ddd24 --- /dev/null +++ b/contribute/design-principles.md @@ -0,0 +1,53 @@ +--- +layout: default +title: 'Design Principles in Beam' +permalink: /contribute/design-principles/ +--- + +# Design Principles in the Apache Beam Project + +Joshua Bloch’s [API Design Bumper Stickers](https://www.infoq.com/articles/API-Design-Joshua-Bloch) are a great list of what makes for good API design. In addition, we have specific design principles we follow in Beam. + +* TOC +{:toc} + +## Use cases + +### Unify the model +Provide one model that works over both bounded (aka. batch) and unbounded (aka. streaming) datasets. Pay special attention to windows / triggers / state / timers, which often trip up folks used to a batch world. Provide users with the right abstractions to adjust latency and completeness guarantees to cover both traditional batch and streaming use cases. + +### Separate data shapes and runtime requirements +The model should focus on letting users describe their data and processing, without exposing any details of a specific runtime system. For example, bounded and unbounded describe the shape of data, but batch and streaming describe the behavior of specific runtime systems. Good test cases are to imagine a mythical micro-batching runner that sits somewhere between batch and streaming or a engine that dynamically switches between streaming and batch depending on the backlog. + +### Make efficient things easy, rather than make easy things efficient +Don’t prevent efficiency for ease of use. Design APIs that provide the information necessary for efficiently executing at scale. Provide class hierarchies and wrappers to make the common cases simpler. + +## Usability + +### Validate Early +Validate constraints on graph shape, runner requirements, etc as early in the compile time - construction time - submission time - execution time spectrum as reasonably possible in order to provide a smoother user experience. + +### Public APIs, like diamonds, are forever (at least until the next major version) +Backwards incompatible changes can only be made in the next major version. Because of the burden major versions place on users (code has to be modified, conflicting dependency nightmares, etc), we aim to do this infrequently. Clearly mark APIs that are considered experimental (may change at any point) and deprecated (will be removed in the next major version). Consider what APIs are more amenable to future changes (abstract classes vs. interfaces, etc.) + +### Examples should be pedagogical +Canonical examples help people ingrain the principles. Design examples that teach complex concepts in modular chunks. If you can’t explain the concept easily, then the API isn’t right. Examples should withstand random copy-pasting. + +## Extensibility + +### Use PTransforms for modularity +Composite transformations (transformations formed by a subgraph of other transformations) are treated as first class objects. They can be named and applied directly in any pipeline to nicely encapsulate concepts. This removes the artificial separation between those built into PCollection and those provided by users. In addition, PTransforms can be used as a clear concept in graphical monitoring and provide a way to scope metadata like aggregators, logging, and resources. Use these when building pipelines. + +### Keep Beam SDKs consistent +Beam SDKs should expose the complete set of concepts in the programming model. They should all use the same set of abstractions and be able to share conceptual documentation. + +### When in ~~Rome~~ Python, do as the ~~Romans~~ Pythonians do +Each SDK must feel right to those who live and breath that language. Adapt the general Beam concepts into language-dependent styles when the benefits clearly outweigh the drawbacks. + +### Encourage DSLs +Many use cases or user communities can be served by provided ‘wrapper’ SDKs that provide a simpler or domain-specific set of abstractions that then build on a Beam SDK and take advantage of Beam Runners. + +### Design for the model, not specific runners + +The Beam APIs should serve all runners. Behind every runner-specific hook, there is a general principle in the model. Design APIs that generalize across multiple runners. +