beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mergebot-r...@apache.org
Subject [beam-site] 01/01: Prepare repository for deployment.
Date Thu, 17 Aug 2017 05:02:52 GMT
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit b548e9ba51b29c1447bf4c68326d2602933d75d1
Author: Mergebot <mergebot@apache.org>
AuthorDate: Thu Aug 17 05:02:44 2017 +0000

    Prepare repository for deployment.
---
 content/blog/index.html |  22 ++
 content/feed.xml        | 599 ++++++++++++++++++++++++++++++++++++++++++++----
 content/index.html      |  10 +-
 3 files changed, 583 insertions(+), 48 deletions(-)

diff --git a/content/blog/index.html b/content/blog/index.html
index 22e9b93..59f355b 100644
--- a/content/blog/index.html
+++ b/content/blog/index.html
@@ -142,6 +142,28 @@
 <p>This is the blog for the Apache Beam project. This blog contains news and updates
 for the project.</p>
 
+<h3 id="a-classpost-link-hrefblog20170816splittable-do-fnhtmlpowerful-and-modular-io-connectors-with-splittable-dofn-in-apache-beama"><a class="post-link" href="/blog/2017/08/16/splittable-do-fn.html">Powerful and modular IO connectors with Splittable DoFn in Apache Beam</a></h3>
+<p><i>Aug 16, 2017 •  Eugene Kirpichov 
+</i></p>
+
+<p>One of the most important parts of the Apache Beam ecosystem is its quickly
+growing set of connectors that allow Beam pipelines to read and write data to
+various data storage systems (“IOs”). Currently, Beam ships <a href="/documentation/io/built-in/">over 20 IO
+connectors</a> with many more in
+active development. As user demands for IO connectors grew, our work on
+improving the related Beam APIs (in particular, the Source API) produced an
+unexpected result: a generalization of Beam’s most basic primitive, <code class="highlighter-rouge">DoFn</code>.</p>
+
+<!-- Render a "read more" button if the post is longer than the excerpt -->
+
+<p>
+<a class="btn btn-default btn-sm" href="/blog/2017/08/16/splittable-do-fn.html" role="button">
+Read more&nbsp;<span class="glyphicon glyphicon-menu-right" aria-hidden="true"></span>
+</a>
+</p>
+
+<hr />
+
 <h3 id="a-classpost-link-hrefblog20170517beam-first-stable-releasehtmlapache-beam-publishes-the-first-stable-releasea"><a class="post-link" href="/blog/2017/05/17/beam-first-stable-release.html">Apache Beam publishes the first stable release</a></h3>
 <p><i>May 17, 2017 •  Davor Bonaci [<a href="https://twitter.com/BonaciDavor">@BonaciDavor</a>] &amp; Dan Halperin 
 </i></p>
diff --git a/content/feed.xml b/content/feed.xml
index 021ea40..c405891 100644
--- a/content/feed.xml
+++ b/content/feed.xml
@@ -9,6 +9,562 @@
     <generator>Jekyll v3.2.0</generator>
     
       <item>
+        <title>Powerful and modular IO connectors with Splittable DoFn in Apache Beam</title>
+        <description>&lt;p&gt;One of the most important parts of the Apache Beam ecosystem is its quickly
+growing set of connectors that allow Beam pipelines to read and write data to
+various data storage systems (“IOs”). Currently, Beam ships &lt;a href=&quot;/documentation/io/built-in/&quot;&gt;over 20 IO
+connectors&lt;/a&gt; with many more in
+active development. As user demands for IO connectors grew, our work on
+improving the related Beam APIs (in particular, the Source API) produced an
+unexpected result: a generalization of Beam’s most basic primitive, &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;.&lt;/p&gt;
+
+&lt;!--more--&gt;
+
+&lt;h2 id=&quot;connectors-as-mini-pipelines&quot;&gt;Connectors as mini-pipelines&lt;/h2&gt;
+
+&lt;p&gt;One of the main reasons for this vibrant IO connector ecosystem is that
+developing a basic IO is relatively straightforward: many connector
+implementations are simply mini-pipelines (composite &lt;code class=&quot;highlighter-rouge&quot;&gt;PTransform&lt;/code&gt;s) made of the
+basic Beam &lt;code class=&quot;highlighter-rouge&quot;&gt;ParDo&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;GroupByKey&lt;/code&gt; primitives. For example,
+&lt;code class=&quot;highlighter-rouge&quot;&gt;ElasticsearchIO.write()&lt;/code&gt;
+&lt;a href=&quot;https://github.com/apache/beam/blob/f7e8f886c91ea9d0b51e00331eeb4484e2f6e000/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L783&quot;&gt;expands&lt;/a&gt;
+into a single &lt;code class=&quot;highlighter-rouge&quot;&gt;ParDo&lt;/code&gt; with some batching for performance; &lt;code class=&quot;highlighter-rouge&quot;&gt;JdcbIO.read()&lt;/code&gt;
+&lt;a href=&quot;https://github.com/apache/beam/blob/f7e8f886c91ea9d0b51e00331eeb4484e2f6e000/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L329&quot;&gt;expands&lt;/a&gt;
+into &lt;code class=&quot;highlighter-rouge&quot;&gt;Create.of(query)&lt;/code&gt;, a reshuffle to &lt;a href=&quot;https://cloud.google.com/dataflow/service/dataflow-service-desc#preventing-fusion&quot;&gt;prevent
+fusion&lt;/a&gt;,
+and &lt;code class=&quot;highlighter-rouge&quot;&gt;ParDo(execute sub-query)&lt;/code&gt;.  Some IOs
+&lt;a href=&quot;https://github.com/apache/beam/blob/8503adbbc3a590cd0dc2939f6a45d335682a9442/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1139&quot;&gt;construct&lt;/a&gt;
+considerably more complicated pipelines.&lt;/p&gt;
+
+&lt;p&gt;&lt;img class=&quot;center-block&quot; src=&quot;/images/blog/splittable-do-fn/jdbcio-expansion.png&quot; alt=&quot;Expansion of the JdbcIO.read() composite transform&quot; width=&quot;600&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;This “mini-pipeline” approach is flexible, modular, and generalizes to data
+sources that read from a dynamically computed &lt;code class=&quot;highlighter-rouge&quot;&gt;PCollection&lt;/code&gt; of locations, such
+as
+&lt;a href=&quot;https://github.com/apache/beam/blob/f7e8f886c91ea9d0b51e00331eeb4484e2f6e000/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java#L222&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;SpannerIO.readAll()&lt;/code&gt;&lt;/a&gt;
+which reads the results of a &lt;code class=&quot;highlighter-rouge&quot;&gt;PCollection&lt;/code&gt; of queries from Cloud Spanner,
+compared to
+&lt;a href=&quot;https://github.com/apache/beam/blob/f7e8f886c91ea9d0b51e00331eeb4484e2f6e000/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java#L318&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;SpannerIO.read()&lt;/code&gt;&lt;/a&gt;
+which executes a single query. We believe such dynamic data sources are a very
+useful capability, often overlooked by other data processing frameworks.&lt;/p&gt;
+
+&lt;h2 id=&quot;when-pardo-and-groupbykey-are-not-enough&quot;&gt;When ParDo and GroupByKey are not enough&lt;/h2&gt;
+
+&lt;p&gt;Despite the flexibility of &lt;code class=&quot;highlighter-rouge&quot;&gt;ParDo&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;GroupByKey&lt;/code&gt; and their derivatives, in some
+cases building an efficient IO connector requires extra capabilities.&lt;/p&gt;
+
+&lt;p&gt;For example, imagine reading files using the sequence &lt;code class=&quot;highlighter-rouge&quot;&gt;ParDo(filepattern →
+expand into files)&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;ParDo(filename → read records)&lt;/code&gt;, or reading a Kafka topic
+using &lt;code class=&quot;highlighter-rouge&quot;&gt;ParDo(topic → list partitions)&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;ParDo(topic, partition → read
+records)&lt;/code&gt;. This approach has two big issues:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the file example, some files might be much larger than others, so the
+second &lt;code class=&quot;highlighter-rouge&quot;&gt;ParDo&lt;/code&gt; may have very long individual &lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; calls. As a
+result, the pipeline can suffer from poor performance due to stragglers.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the Kafka example, implementing the second &lt;code class=&quot;highlighter-rouge&quot;&gt;ParDo&lt;/code&gt; is &lt;em&gt;simply impossible&lt;/em&gt;
+with a regular &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;, because it would need to output an infinite number of
+records per each input element &lt;code class=&quot;highlighter-rouge&quot;&gt;topic, partition&lt;/code&gt; &lt;em&gt;(&lt;a href=&quot;/blog/2017/02/13/stateful-processing.html&quot;&gt;stateful processing&lt;/a&gt; comes close, but it
+has other limitations that make it insufficient for this task&lt;/em&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;beam-source-api&quot;&gt;Beam Source API&lt;/h2&gt;
+
+&lt;p&gt;Apache Beam historically provides a Source API
+(&lt;a href=&quot;/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/io/BoundedSource.html&quot;&gt;BoundedSource&lt;/a&gt;
+and
+&lt;a href=&quot;/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/io/UnboundedSource.html&quot;&gt;UnboundedSource&lt;/a&gt;) which does
+not have these limitations and allows development of efficient data sources for
+batch and streaming systems. Pipelines use this API via the
+&lt;a href=&quot;/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/io/Read.html&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;Read.from(Source)&lt;/code&gt;&lt;/a&gt; built-in &lt;code class=&quot;highlighter-rouge&quot;&gt;PTransform&lt;/code&gt;.&lt;/p&gt;
+
+&lt;p&gt;The Source API is largely similar to that of most other data processing
+frameworks, and allows the system to read data in parallel using multiple
+workers, as well as checkpoint and resume reading from an unbounded data source.
+Additionally, the Beam
+&lt;a href=&quot;/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/io/BoundedSource.html&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;BoundedSource&lt;/code&gt;&lt;/a&gt;
+API provides advanced features such as progress reporting and &lt;a href=&quot;/blog/2016/05/18/splitAtFraction-method.html&quot;&gt;dynamic
+rebalancing&lt;/a&gt;
+(which together enable autoscaling), and
+&lt;a href=&quot;/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/io/UnboundedSource.html&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;UnboundedSource&lt;/code&gt;&lt;/a&gt; supports
+reporting the source’s watermark and backlog &lt;em&gt;(until SDF, we believed that
+“batch” and “streaming” data sources are fundamentally different and thus
+require fundamentally different APIs)&lt;/em&gt;.&lt;/p&gt;
+
+&lt;p&gt;Unfortunately, these features come at a price. Coding against the Source API
+involves a lot of boilerplate and is error-prone, and it does not compose well
+with the rest of the Beam model because a &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; can appear only at the root
+of a pipeline. For example:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using the Source API, it is not possible to read a &lt;code class=&quot;highlighter-rouge&quot;&gt;PCollection&lt;/code&gt; of
+filepatterns.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; can not read a side input, or wait on another pipeline step to
+produce the data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; can not emit an additional output (for example, records that failed to
+parse) and so on.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;The Source API is not composable even with itself. For example, suppose Alice
+implements an unbounded &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; that watches a directory for new matching
+files, and Bob implements an unbounded &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; that tails a file. The Source
+API does not let them simply chain the sources together and obtain a &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt;
+that returns new records in new log files in a directory (a very common user
+request). Instead, such a source would have to be developed mostly from
+scratch, and our experience shows that a full-featured monolithic
+implementation of such a &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; is incredibly difficult and error-prone.&lt;/p&gt;
+
+&lt;p&gt;Another class of issues with the &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; API comes from its strict
+bounded/unbounded dichotomy:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;It is difficult or impossible to reuse code between seemingly very similar
+bounded and unbounded sources, for example, the &lt;code class=&quot;highlighter-rouge&quot;&gt;BoundedSource&lt;/code&gt; that generates
+a sequence &lt;code class=&quot;highlighter-rouge&quot;&gt;[a, b)&lt;/code&gt; and the &lt;code class=&quot;highlighter-rouge&quot;&gt;UnboundedSource&lt;/code&gt; that generates a sequence &lt;code class=&quot;highlighter-rouge&quot;&gt;[a,
+inf)&lt;/code&gt; &lt;a href=&quot;https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CountingSource.java&quot;&gt;don’t share any
+code&lt;/a&gt;
+in the Beam Java SDK.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It is not clear how to classify the ingestion of a very large and
+continuously growing dataset. Ingesting its “already available” part seems to
+require a &lt;code class=&quot;highlighter-rouge&quot;&gt;BoundedSource&lt;/code&gt;: the runner could benefit from knowing its size, and
+could perform dynamic rebalancing. However, ingesting the continuously arriving
+new data seems to require an &lt;code class=&quot;highlighter-rouge&quot;&gt;UnboundedSource&lt;/code&gt; for providing watermarks. From
+this angle, the &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; API has &lt;a href=&quot;https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101&quot;&gt;the same issues as Lambda
+Architecture&lt;/a&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;About two years ago we began thinking about how to address the limitations of
+the Source API, and ended up, surprisingly, addressing the limitations of
+&lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; instead.&lt;/p&gt;
+
+&lt;h2 id=&quot;enter-splittable-dofn&quot;&gt;Enter Splittable DoFn&lt;/h2&gt;
+
+&lt;p&gt;&lt;a href=&quot;http://s.apache.org/splittable-do-fn&quot;&gt;Splittable DoFn&lt;/a&gt; (SDF) is a
+generalization of &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; that gives it the core capabilities of &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; while
+retaining &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;’s syntax, flexibility, modularity, and ease of coding.  As a
+result, it becomes possible to develop more powerful IO connectors than before,
+with shorter, simpler, more reusable code.&lt;/p&gt;
+
+&lt;p&gt;Note that, unlike &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt;, SDF &lt;em&gt;does not&lt;/em&gt; have distinct bounded/unbounded APIs,
+just as regular &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;s don’t: there is only one API, which covers both of these
+use cases and anything in between. Thus, SDF closes the final gap in the unified
+batch/streaming programming model of Apache Beam.&lt;/p&gt;
+
+&lt;p&gt;When reading the explanation of SDF below, keep in mind the running example of a
+&lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; that takes a filename as input and outputs the records in that file.
+People familiar with the &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; API may find it useful to think of SDF as a
+way to read a &lt;code class=&quot;highlighter-rouge&quot;&gt;PCollection&lt;/code&gt; of sources, treating the source itself as just
+another piece of data in the pipeline &lt;em&gt;(this, in fact, was one of the early
+design iterations among the work that led to creation of SDF)&lt;/em&gt;.&lt;/p&gt;
+
+&lt;p&gt;The two aspects where &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; has an advantage over a regular &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; are:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Splittability:&lt;/strong&gt; applying a &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; to a single element is &lt;em&gt;monolithic&lt;/em&gt;, but
+reading from a &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; is &lt;em&gt;non-monolithic&lt;/em&gt;. The whole &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; doesn’t have to
+be read at once; rather, it is read in parts, called &lt;em&gt;bundles&lt;/em&gt;. For example, a
+large file is usually read in several bundles, each reading some sub-range of
+offsets within the file. Likewise, a Kafka topic (which, of course, can never
+be read “fully”) is read over an infinite number of bundles, each reading some
+finite number of elements.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Interaction with the runner:&lt;/strong&gt; runners apply a &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; to a single element as
+a “black box”, but interact quite richly with &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt;. &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; provides the
+runner with information such as its estimated size (or its generalization,
+“backlog”), progress through reading the bundle, watermarks etc. The runner
+uses this information to tune the execution and control the breakdown of the
+&lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; into bundles. For example, a slowly progressing large bundle of a file
+may be &lt;a href=&quot;https://cloud.google.com/blog/big-data/2016/05/no-shard-left-behind-dynamic-work-rebalancing-in-google-cloud-dataflow&quot;&gt;dynamically
+split&lt;/a&gt;
+by a batch-focused runner before it becomes a straggler, and a latency-focused
+streaming runner may control how many elements it reads from a source in each
+bundle to optimize for latency vs. per-bundle overhead.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;non-monolithic-element-processing-with-restrictions&quot;&gt;Non-monolithic element processing with restrictions&lt;/h3&gt;
+
+&lt;p&gt;Splittable &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; supports &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt;-like features by allowing the processing of
+a single element to be non-monolithic.&lt;/p&gt;
+
+&lt;p&gt;The processing of one element by an SDF is decomposed into a (potentially
+infinite) number of &lt;em&gt;restrictions&lt;/em&gt;, each describing some part of the work to be
+done for the whole element. The input to an SDF’s &lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; call is a
+pair of an element and a restriction (compared to a regular &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;, which takes
+just the element).&lt;/p&gt;
+
+&lt;p&gt;Processing of every element starts by creating an &lt;em&gt;initial restriction&lt;/em&gt; that
+describes the entire work, and the initial restriction is then split further
+into sub-restrictions which must logically add up to the original. For example,
+for a splittable &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; called &lt;code class=&quot;highlighter-rouge&quot;&gt;ReadFn&lt;/code&gt; that takes a filename and outputs
+records in the file, the restriction may be a pair of starting and ending byte
+offset, and &lt;code class=&quot;highlighter-rouge&quot;&gt;ReadFn&lt;/code&gt; may interpret it as &lt;em&gt;read records whose starting offsets
+are in the given range&lt;/em&gt;.&lt;/p&gt;
+
+&lt;p&gt;&lt;img class=&quot;center-block&quot; src=&quot;/images/blog/splittable-do-fn/restrictions.png&quot; alt=&quot;Specifying parts of work for an element using restrictions&quot; width=&quot;600&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;The idea of restrictions provides non-monolithic execution - the first
+ingredient for parity with &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt;. The other ingredient is &lt;em&gt;interaction with
+the runner&lt;/em&gt;: the runner has access to the restriction of each active
+&lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; call of an SDF, can inquire about the progress of the call,
+and most importantly, can &lt;em&gt;split&lt;/em&gt; the restriction while it is being processed
+(hence the name &lt;em&gt;Splittable DoFn&lt;/em&gt;).&lt;/p&gt;
+
+&lt;p&gt;Splitting produces a &lt;em&gt;primary&lt;/em&gt; and &lt;em&gt;residual&lt;/em&gt; restriction that add up to the
+original restriction being split: the current &lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; call keeps
+processing the primary, and the residual will be processed by another
+&lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; call. For example, a runner may schedule the residual to be
+processed in parallel on another worker.&lt;/p&gt;
+
+&lt;p&gt;Splitting of a running &lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; call has two critically important uses:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;strong&gt;Supporting infinite work per element.&lt;/strong&gt; A restriction is, in general, not
+required to describe a finite amount of work. For example, reading from a Kafka
+topic starting from offset &lt;em&gt;100&lt;/em&gt; can be represented by the
+restriction &lt;em&gt;[100, inf)&lt;/em&gt;. A &lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; call processing this
+entire restriction would, of course, never complete. However, while such a call
+runs, a runner can split the restriction into a &lt;em&gt;finite&lt;/em&gt; primary &lt;em&gt;[100, 150)&lt;/em&gt;
+(letting the current call complete this part) and an &lt;em&gt;infinite&lt;/em&gt; residual &lt;em&gt;[150,
+inf)&lt;/em&gt; to be processed later, effectively checkpointing and resuming the call;
+this can be repeated forever.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img class=&quot;center-block&quot; src=&quot;/images/blog/splittable-do-fn/kafka-splitting.png&quot; alt=&quot;Splitting an infinite restriction into a finite primary and infinite residual&quot; width=&quot;400&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;strong&gt;Dynamic rebalancing.&lt;/strong&gt; When a (typically batch-focused) runner detects that
+a &lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; call is going to take too long and become a straggler, it
+can split the restriction in some proportion so that the primary is short enough
+to not be a straggler, and can schedule the residual in parallel on another
+worker. For details, see &lt;a href=&quot;https://cloud.google.com/blog/big-data/2016/05/no-shard-left-behind-dynamic-work-rebalancing-in-google-cloud-dataflow&quot;&gt;No Shard Left
+Behind&lt;/a&gt;.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Logically, the execution of an SDF on an element works according to the
+following diagram, where “magic” stands for the runner-specific ability to split
+the restrictions and schedule processing of residuals.&lt;/p&gt;
+
+&lt;p&gt;&lt;img class=&quot;center-block&quot; src=&quot;/images/blog/splittable-do-fn/transform-expansion.png&quot; alt=&quot;Execution of an SDF - pairing with a restriction, splitting     restrictions, processing element/restriction pairs&quot; width=&quot;600&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;This diagram emphasizes that splittability is an implementation detail of the
+particular &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;: a splittable &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; still looks like a &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&amp;lt;A, B&amp;gt;&lt;/code&gt; to its
+user, and can be applied via a &lt;code class=&quot;highlighter-rouge&quot;&gt;ParDo&lt;/code&gt; to a &lt;code class=&quot;highlighter-rouge&quot;&gt;PCollection&amp;lt;A&amp;gt;&lt;/code&gt; producing a
+&lt;code class=&quot;highlighter-rouge&quot;&gt;PCollection&amp;lt;B&amp;gt;&lt;/code&gt;.&lt;/p&gt;
+
+&lt;h3 id=&quot;which-dofns-need-to-be-splittable&quot;&gt;Which DoFns need to be splittable&lt;/h3&gt;
+
+&lt;p&gt;Note that decomposition of an element into element/restriction pairs is not
+automatic or “magical”: SDF is a new API for &lt;em&gt;authoring&lt;/em&gt; a &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;, rather than a
+new way to &lt;em&gt;execute&lt;/em&gt; an existing &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;. When making a &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; splittable, the
+author needs to:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider the structure of the work it does for every element.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Come up with a scheme for describing parts of this work using restrictions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Write code for creating the initial restriction, splitting it, and executing
+an element/restriction pair.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;An overwhelming majority of &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;s found in user pipelines do not need to be
+made splittable: SDF is an advanced, powerful API, primarily targeting authors
+of new IO connectors &lt;em&gt;(though it has interesting non-IO applications as well:
+see &lt;a href=&quot;http://s.apache.org/splittable-do-fn#heading=h.5cep9s8k4fxv&quot;&gt;Non-IO examples&lt;/a&gt;)&lt;/em&gt;.&lt;/p&gt;
+
+&lt;h3 id=&quot;execution-of-a-restriction-and-data-consistency&quot;&gt;Execution of a restriction and data consistency&lt;/h3&gt;
+
+&lt;p&gt;One of the most important parts of the Splittable &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; design is related to
+how it achieves data consistency while splitting. For example, while the runner
+is preparing to split the restriction of an active &lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; call, how
+can it be sure that the call has not concurrently progressed past the point of
+splitting?&lt;/p&gt;
+
+&lt;p&gt;This is achieved by requiring the processing of a restriction to follow a
+certain pattern. We think of a restriction as a sequence of &lt;em&gt;blocks&lt;/em&gt; -
+elementary indivisible units of work, identified by a &lt;em&gt;position&lt;/em&gt;. A
+&lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; call processes the blocks one by one, first &lt;em&gt;claiming&lt;/em&gt; the
+block’s position to atomically check if it’s still within the range of the
+restriction, until the whole restriction is processed.&lt;/p&gt;
+
+&lt;p&gt;The diagram below illustrates this for &lt;code class=&quot;highlighter-rouge&quot;&gt;ReadFn&lt;/code&gt; (a splittable &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; that reads
+Avro files) processing the element &lt;code class=&quot;highlighter-rouge&quot;&gt;foo.avro&lt;/code&gt; with restriction &lt;code class=&quot;highlighter-rouge&quot;&gt;[30, 70)&lt;/code&gt;. This
+&lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; call scans the Avro file for &lt;a href=&quot;https://avro.apache.org/docs/current/spec.html#Object+Container+Files&quot;&gt;data
+blocks&lt;/a&gt;
+starting from offset &lt;code class=&quot;highlighter-rouge&quot;&gt;30&lt;/code&gt; and claims the position of each block in this range.
+If a block is claimed successfully, then the call outputs all records in this
+data block, otherwise, it terminates.&lt;/p&gt;
+
+&lt;p&gt;&lt;img class=&quot;center-block&quot; src=&quot;/images/blog/splittable-do-fn/blocks.png&quot; alt=&quot;Processing a restriction by claiming blocks inside it&quot; width=&quot;400&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;For more details, see &lt;a href=&quot;http://s.apache.org/splittable-do-fn#heading=h.vjs7pzbb7kw&quot;&gt;Restrictions, blocks and
+positions&lt;/a&gt; in the
+design proposal document.&lt;/p&gt;
+
+&lt;h3 id=&quot;code-example&quot;&gt;Code example&lt;/h3&gt;
+
+&lt;p&gt;Let us look at some examples of SDF code. The examples use the Beam Java SDK,
+which &lt;a href=&quot;https://github.com/apache/beam/blob/f7e8f886c91ea9d0b51e00331eeb4484e2f6e000/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L527&quot;&gt;represents splittable
+&lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;s&lt;/a&gt;
+as part of the flexible &lt;a href=&quot;http://s.apache.org/a-new-dofn&quot;&gt;annotation-based
+&lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;&lt;/a&gt; machinery, and the &lt;a href=&quot;https://s.apache.org/splittable-do-fn-python&quot;&gt;proposed SDF syntax
+for Python&lt;/a&gt;.&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A splittable &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; is a &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; - no new base class needed. Any SDF derives
+from the &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; class and has a &lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; method.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The &lt;code class=&quot;highlighter-rouge&quot;&gt;@ProcessElement&lt;/code&gt; method takes an additional
+&lt;a href=&quot;https://github.com/apache/beam/blob/f7e8f886c91ea9d0b51e00331eeb4484e2f6e000/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;RestrictionTracker&lt;/code&gt;&lt;/a&gt;
+parameter that gives access to the current restriction in addition to the
+current element.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An SDF needs to define a &lt;code class=&quot;highlighter-rouge&quot;&gt;@GetInitialRestriction&lt;/code&gt; method that can create a
+restriction describing the complete work for a given element.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;There are several less important optional methods, such as
+&lt;code class=&quot;highlighter-rouge&quot;&gt;@SplitRestriction&lt;/code&gt; for pre-splitting the initial restriction into several
+smaller restrictions, and a few others.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;The “Hello World” of SDF is a counter, which takes pairs &lt;em&gt;(x, N)&lt;/em&gt; as input and
+produces pairs &lt;em&gt;(x, 0), (x, 1), …, (x, N-1)&lt;/em&gt; as output.&lt;/p&gt;
+
+&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;CountFn&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DoFn&lt;/span&gt;&lt;span class=&quo [...]
+  &lt;span class=&quot;nd&quot;&gt;@ProcessElement&lt;/span&gt;
+  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ProcessContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OffsetRangeTracker&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tracker&lt;/span&gt;&lt;span class=&qu [...]
+    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tracker&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;currentRestriction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getFr [...]
+      &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KV&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&q [...]
+    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+
+  &lt;span class=&quot;nd&quot;&gt;@GetInitialRestriction&lt;/span&gt;
+  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OffsetRange&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getInitialRange&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KV&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt [...]
+    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;OffsetRange&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;element&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());& [...]
+  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+
+&lt;span class=&quot;n&quot;&gt;PCollection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KV&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt; &lt;span class=&quot;o& [...]
+&lt;span class=&quot;n&quot;&gt;PCollection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KV&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output&lt;/span&gt; &lt;span class=&quot;o [...]
+    &lt;span class=&quot;n&quot;&gt;ParDo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CountFn&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;());&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;div class=&quot;language-py highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;CountFn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DoFn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
+  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;element&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tracker&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DoFn&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RestrictionTracker [...]
+    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;xrange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tracker&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current_restriction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()):& [...]
+      &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tracker&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;try_claim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
+        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;
+      &lt;span class=&quot;k&quot;&gt;yield&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;element&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;
+        
+  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;get_initial_restriction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;element&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
+    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;element&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;This short &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; subsumes the functionality of
+&lt;a href=&quot;https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CountingSource.java&quot;&gt;CountingSource&lt;/a&gt;,
+but is more flexible: &lt;code class=&quot;highlighter-rouge&quot;&gt;CountingSource&lt;/code&gt; generates only one sequence specified at
+pipeline construction time, while this &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; can generate a dynamic family of
+sequences, one per element in the input collection (it does not matter whether
+the input collection is bounded or unbounded).&lt;/p&gt;
+
+&lt;p&gt;However, the &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt;-specific capabilities of &lt;code class=&quot;highlighter-rouge&quot;&gt;CountingSource&lt;/code&gt; are still
+available in &lt;code class=&quot;highlighter-rouge&quot;&gt;CountFn&lt;/code&gt;. For example, if a sequence has a lot of elements, a
+batch-focused runner can still apply dynamic rebalancing to it and generate
+different subranges of the sequence in parallel by splitting the &lt;code class=&quot;highlighter-rouge&quot;&gt;OffsetRange&lt;/code&gt;.
+Likewise, a streaming-focused runner can use the same splitting logic to
+checkpoint and resume the generation of the sequence even if it is, for
+practical purposes, infinite (for example, when applied to a &lt;code class=&quot;highlighter-rouge&quot;&gt;KV(...,
+Long.MAX_VALUE)&lt;/code&gt;).&lt;/p&gt;
+
+&lt;p&gt;A slightly more complex example is the &lt;code class=&quot;highlighter-rouge&quot;&gt;ReadFn&lt;/code&gt; considered above, which reads
+data from Avro files and illustrates the idea of &lt;em&gt;blocks&lt;/em&gt;: we provide pseudocode
+to illustrate the approach.&lt;/p&gt;
+
+&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ReadFn&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DoFn&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot; [...]
+  &lt;span class=&quot;nd&quot;&gt;@ProcessElement&lt;/span&gt;
+  &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ProcessContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OffsetRangeTracker&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tracker&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&q [...]
+    &lt;span class=&quot;k&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AvroReader&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;reader&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Avro&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/sp [...]
+      &lt;span class=&quot;c1&quot;&gt;// Seek to the first block starting at or after the start offset.&lt;/span&gt;
+      &lt;span class=&quot;n&quot;&gt;reader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;seek&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tracker&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;currentRestriction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getFrom&lt;/span&gt;&lt;span class=&quot;o&quot;&g [...]
+      &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;readNextBlock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+        &lt;span class=&quot;c1&quot;&gt;// Claim the position of the current Avro block&lt;/span&gt;
+        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tracker&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;tryClaim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;currentBlockOffset&lt;/span&gt;&lt;span class=&quot;o&quot;&g [...]
+          &lt;span class=&quot;c1&quot;&gt;// Out of range of the current restriction - we're done.&lt;/span&gt;
+          &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
+        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+        &lt;span class=&quot;c1&quot;&gt;// Emit all records in this block&lt;/span&gt;
+        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AvroRecord&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;reader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;currentBlock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&g [...]
+          &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
+        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+      &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+
+  &lt;span class=&quot;nd&quot;&gt;@GetInitialRestriction&lt;/span&gt;
+  &lt;span class=&quot;n&quot;&gt;OffsetRange&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getInitialRestriction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;OffsetRange&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;File&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/s [...]
+  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;div class=&quot;language-py highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AvroReader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DoFn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
+  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tracker&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DoFn&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RestrictionTracke [...]
+    &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fileio&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ChannelFactory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/s [...]
+      &lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stop&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tracker&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;current_restriction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
+      &lt;span class=&quot;c&quot;&gt;# Seek to the first block starting at or after the start offset.&lt;/span&gt;
+      &lt;span class=&quot;nb&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seek&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+      &lt;span class=&quot;n&quot;&gt;block&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AvroUtils&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_next_block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+      &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
+        &lt;span class=&quot;c&quot;&gt;# Claim the position of the current Avro block&lt;/span&gt;
+        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tracker&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;try_claim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()):&lt;/span&gt;
+          &lt;span class=&quot;c&quot;&gt;# Out of range of the current restriction - we're done.&lt;/span&gt;
+          &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;
+        &lt;span class=&quot;c&quot;&gt;# Emit all records in this block&lt;/span&gt;
+        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
+          &lt;span class=&quot;k&quot;&gt;yield&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;
+        &lt;span class=&quot;n&quot;&gt;block&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AvroUtils&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_next_block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;file&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+        
+  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;get_initial_restriction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
+    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fileio&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ChannelFactory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;size_in_bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt [...]
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;This hypothetical &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; reads records from a single Avro file. Notably missing
+is the code for expanding a filepattern: it no longer needs to be part of this
+&lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;! Instead, the SDK includes a
+&lt;a href=&quot;https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Match.java&quot;&gt;Match.filepatterns()&lt;/a&gt;
+transform for expanding a filepattern into a &lt;code class=&quot;highlighter-rouge&quot;&gt;PCollection&lt;/code&gt; of filenames, and
+different file format IOs can reuse the same transform, reading the files with
+different &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt;s.&lt;/p&gt;
+
+&lt;p&gt;This example demonstrates the benefits of increased modularity allowed by SDF:
+&lt;code class=&quot;highlighter-rouge&quot;&gt;Match&lt;/code&gt; supports continuous ingestion of new files in streaming pipelines using
+&lt;code class=&quot;highlighter-rouge&quot;&gt;.continuously()&lt;/code&gt;, and this functionality becomes automatically available to
+various file format IOs. For example, &lt;code class=&quot;highlighter-rouge&quot;&gt;TextIO.read().watchForNewFiles()&lt;/code&gt; &lt;a href=&quot;https://github.com/apache/beam/blob/f7e8f886c91ea9d0b51e00331eeb4484e2f6e000/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L480&quot;&gt;uses
+&lt;code class=&quot;highlighter-rouge&quot;&gt;Match&lt;/code&gt; under the
+hood)&lt;/a&gt;.&lt;/p&gt;
+
+&lt;h2 id=&quot;current-status&quot;&gt;Current status&lt;/h2&gt;
+
+&lt;p&gt;Splittable &lt;code class=&quot;highlighter-rouge&quot;&gt;DoFn&lt;/code&gt; is a major new API, and its delivery and widespread adoption
+involves a lot of work in different parts of the Apache Beam ecosystem.  Some
+of that work is already complete and provides direct benefit to users via new
+IO connectors. However, a large amount of work is in progress or planned.&lt;/p&gt;
+
+&lt;p&gt;As of August 2017, SDF is available for use in the Beam Java Direct runner and
+Dataflow Streaming runner, and implementation is in progress in the Flink and
+Apex runners; see &lt;a href=&quot;/documentation/runners/capability-matrix/&quot;&gt;capability matrix&lt;/a&gt; for the current status. Support
+for SDF in the Python SDK is &lt;a href=&quot;https://s.apache.org/splittable-do-fn-python&quot;&gt;in active
+development&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;Several SDF-based transforms and IO connectors are available for Beam users at
+HEAD and will be included in Beam 2.2.0. &lt;code class=&quot;highlighter-rouge&quot;&gt;TextIO&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;AvroIO&lt;/code&gt; finally provide
+continuous ingestion of files (one of the most frequently requested features)
+via &lt;code class=&quot;highlighter-rouge&quot;&gt;.watchForNewFiles()&lt;/code&gt; which is backed by the utility transforms
+&lt;code class=&quot;highlighter-rouge&quot;&gt;Match.filepatterns().continuously()&lt;/code&gt; and the more general
+&lt;a href=&quot;https://github.com/apache/beam/blob/f7e8f886c91ea9d0b51e00331eeb4484e2f6e000/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;Watch.growthOf()&lt;/code&gt;&lt;/a&gt;.
+These utility transforms are also independently useful for “power user” use cases.&lt;/p&gt;
+
+&lt;p&gt;To enable more flexible use cases for IOs currently based on the Source API, we
+will change them to use SDF. This transition is &lt;a href=&quot;http://s.apache.org/textio-sdf&quot;&gt;pioneered by
+TextIO&lt;/a&gt; and involves temporarily &lt;a href=&quot;http://s.apache.org/sdf-via-source&quot;&gt;executing SDF
+via the Source API&lt;/a&gt; to support runners
+lacking the ability to run SDF directly.&lt;/p&gt;
+
+&lt;p&gt;In addition to enabling new IOs, work on SDF has influenced our thinking about
+other parts of the Beam programming model:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;SDF unified the final remaining part of the Beam programming model that was
+not batch/streaming agnostic (the &lt;code class=&quot;highlighter-rouge&quot;&gt;Source&lt;/code&gt; API). This led us to consider use
+cases that cannot be described as purely batch or streaming (for example,
+ingesting a large amount of historical data and carrying on with more data
+arriving in real time) and to develop a &lt;a href=&quot;http://s.apache.org/beam-fn-api-progress-reporting&quot;&gt;unified notion of “progress” and
+“backlog”&lt;/a&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The &lt;a href=&quot;http://s.apache.org/beam-fn-api&quot;&gt;Fn API&lt;/a&gt; - the foundation of Beam’s
+future support for cross-language pipelines - uses SDF as &lt;em&gt;the only&lt;/em&gt; concept
+representing data ingestion.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Implementation of SDF has lead to &lt;a href=&quot;https://lists.apache.org/thread.html/86831496a08fe148e3b982cdb904f828f262c0b571543a9fed7b915d@%3Cdev.beam.apache.org%3E&quot;&gt;formalizing pipeline termination
+semantics&lt;/a&gt;
+and making it consistent between runners.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;SDF set a new standard for how modular IO connectors can be, inspiring
+creation of similar APIs for some non-SDF-based connectors (for example,
+&lt;code class=&quot;highlighter-rouge&quot;&gt;SpannerIO.readAll()&lt;/code&gt; and the
+&lt;a href=&quot;https://issues.apache.org/jira/browse/BEAM-2706&quot;&gt;planned&lt;/a&gt; &lt;code class=&quot;highlighter-rouge&quot;&gt;JdbcIO.readAll()&lt;/code&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;call-to-action&quot;&gt;Call to action&lt;/h2&gt;
+
+&lt;p&gt;Apache Beam thrives on having a large community of contributors. Here are some
+ways you can get involved in the SDF effort and help make the Beam IO connector
+ecosystem more modular:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Use the currently available SDF-based IO connectors, provide feedback, file
+bugs, and suggest or implement improvements.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Propose or develop a new IO connector based on SDF.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Implement or improve support for SDF in your favorite runner.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Subscribe and contribute to the occasional SDF-related discussions on
+&lt;a href=&quot;mailto:user@beam.apache.org&quot;&gt;user@beam.apache.org&lt;/a&gt; (mailing list for Beam
+users) and &lt;a href=&quot;mailto:dev@beam.apache.org&quot;&gt;dev@beam.apache.org&lt;/a&gt; (mailing list for
+Beam developers)!&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</description>
+        <pubDate>Wed, 16 Aug 2017 01:00:01 -0700</pubDate>
+        <link>https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html</link>
+        <guid isPermaLink="true">https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Apache Beam publishes the first stable release</title>
         <description>&lt;p&gt;The Apache Beam community is pleased to &lt;a href=&quot;https://blogs.apache.org/foundation/entry/the-apache-software-foundation-announces12&quot;&gt;announce the availability of version 2.0.0&lt;/a&gt;. This is the first stable release of Apache Beam, signifying a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, and making Beam suitable for enterprise deployment.&lt;/p&gt;
 
@@ -1342,48 +1898,5 @@ Java SDK. If you have questions or comments, we’d love to hear them on the
         
       </item>
     
-      <item>
-        <title>The first release of Apache Beam!</title>
-        <description>&lt;p&gt;I’m happy to announce that Apache Beam has officially released its first
-version – 0.1.0-incubating. This is an exciting milestone for the project,
-which joined the Apache Software Foundation and the Apache Incubator earlier
-this year.&lt;/p&gt;
-
-&lt;!--more--&gt;
-
-&lt;p&gt;This release publishes the first set of Apache Beam binaries and source code,
-making them readily available for our users. The initial release includes the
-SDK for Java, along with three runners: Apache Flink, Apache Spark and Google
-Cloud Dataflow, a fully-managed cloud service. The release is available both
-in the &lt;a href=&quot;http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22&quot;&gt;Maven Central Repository&lt;/a&gt;,
-as well as a download from the &lt;a href=&quot;/get-started/downloads/&quot;&gt;project’s website&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;The goal of this release was process-oriented. In particular, the Beam
-community wanted to release existing functionality to our users, build and
-validate the release processes, and obtain validation from the Apache Software
-Foundation and the Apache Incubator.&lt;/p&gt;
-
-&lt;p&gt;I’d like to encourage everyone to try out this release. Please keep in mind
-that this is the first incubating release – significant changes are to be
-expected. As we march toward stability, a rapid cadence of future releases is
-anticipated, perhaps one every 1-2 months.&lt;/p&gt;
-
-&lt;p&gt;As always, the Beam community welcomes feedback. Stabilization, usability and
-the developer experience will be our focus for the next several months. If you
-have any comments or discover any issues, I’d like to invite you to reach out
-to us via &lt;a href=&quot;/get-started/support/&quot;&gt;user’s mailing list&lt;/a&gt; or the
-&lt;a href=&quot;https://issues.apache.org/jira/browse/BEAM/&quot;&gt;Apache JIRA issue tracker&lt;/a&gt;.&lt;/p&gt;
-</description>
-        <pubDate>Wed, 15 Jun 2016 00:00:01 -0700</pubDate>
-        <link>https://beam.apache.org/beam/release/2016/06/15/first-release.html</link>
-        <guid isPermaLink="true">https://beam.apache.org/beam/release/2016/06/15/first-release.html</guid>
-        
-        
-        <category>beam</category>
-        
-        <category>release</category>
-        
-      </item>
-    
   </channel>
 </rss>
diff --git a/content/index.html b/content/index.html
index 3773e8d..3d64303 100644
--- a/content/index.html
+++ b/content/index.html
@@ -164,6 +164,11 @@
           </div>
           <div class="hero__blog__cards">
             
+            <a class="hero__blog__cards__card" href="/blog/2017/08/16/splittable-do-fn.html">
+              <div class="hero__blog__cards__card__title">Powerful and modular IO connectors with Splittable DoFn in Apache Beam</div>
+              <div class="hero__blog__cards__card__date">Aug 16, 2017</div>
+            </a>
+            
             <a class="hero__blog__cards__card" href="/blog/2017/05/17/beam-first-stable-release.html">
               <div class="hero__blog__cards__card__title">Apache Beam publishes the first stable release</div>
               <div class="hero__blog__cards__card__date">May 17, 2017</div>
@@ -174,11 +179,6 @@
               <div class="hero__blog__cards__card__date">Mar 16, 2017</div>
             </a>
             
-            <a class="hero__blog__cards__card" href="/blog/2017/02/13/stateful-processing.html">
-              <div class="hero__blog__cards__card__title">Stateful processing with Apache Beam</div>
-              <div class="hero__blog__cards__card__date">Feb 13, 2017</div>
-            </a>
-            
           </div>
         </div>
       </div>

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <commits@beam.apache.org>.

Mime
View raw message