beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mergebot-r...@apache.org
Subject [beam-site] 01/01: Prepare repository for deployment.
Date Tue, 19 Dec 2017 23:48:20 GMT
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 2ca4ba3dcb1077c51a558035398b1661941176a7
Author: Mergebot <mergebot@apache.org>
AuthorDate: Tue Dec 19 23:48:17 2017 +0000

    Prepare repository for deployment.
---
 content/contribute/contribution-guide/index.html   |  2 +-
 content/contribute/maturity-model/index.html       |  2 +-
 content/documentation/execution-model/index.html   | 61 +++++++----------
 .../pipelines/design-your-pipeline/index.html      | 80 ++++++++++++++--------
 content/documentation/programming-guide/index.html | 15 ++--
 content/get-started/beam-overview/index.html       | 14 ++--
 .../get-started/mobile-gaming-example/index.html   | 55 ++++++++-------
 content/get-started/wordcount-example/index.html   | 37 +++++-----
 8 files changed, 144 insertions(+), 122 deletions(-)

diff --git a/content/contribute/contribution-guide/index.html b/content/contribute/contribution-guide/index.html
index 6c3de56..ec6addf 100644
--- a/content/contribute/contribution-guide/index.html
+++ b/content/contribute/contribution-guide/index.html
@@ -188,7 +188,7 @@ or participate on the documentation effort.</p>
 
 <p>We use a review-then-commit workflow in Beam for all contributions.</p>
 
-<p><img src="/images/contribution-guide-1.png" alt="Alt text" title="Workflow image" /></p>
+<p><img src="/images/contribution-guide-1.png" alt="The Beam contribution workflow has 5 steps: engage, design, code, review, and commit." /></p>
 
 <p><strong>For larger contributions or those that affect multiple components:</strong></p>
 
diff --git a/content/contribute/maturity-model/index.html b/content/contribute/maturity-model/index.html
index b169c36..69c3b34 100644
--- a/content/contribute/maturity-model/index.html
+++ b/content/contribute/maturity-model/index.html
@@ -422,7 +422,7 @@ graduation process and is no longer being maintained.</em></p>
 
 <p>Finally, the contributor diversity has increased significantly. Over each of the last three months, no organization has had more than ~50% of the unique contributors per month. (Assumptions: commits to master branch of the main repository, excludes merge commits, best effort to identify unique contributors).</p>
 
-<p><img src="/images/contribution-diversity.png" alt="Alt text" title="Contributor Diversity" /></p>
+<p><img src="/images/contribution-diversity.png" alt="Contributor diversity graph" /></p>
 
 <h2 id="dependency-analysis">Dependency analysis</h2>
 <p>This section analyses project’s direct and transitive dependencies to ensure compliance with Apache Software Foundation’s policies and guidelines.</p>
diff --git a/content/documentation/execution-model/index.html b/content/documentation/execution-model/index.html
index 26d7cce..03c2db3 100644
--- a/content/documentation/execution-model/index.html
+++ b/content/documentation/execution-model/index.html
@@ -215,21 +215,6 @@
 may observe various effects as a result of the runner’s choices. This page
 describes these effects so you can better understand how Beam pipelines execute.</p>
 
-<ul id="markdown-toc">
-  <li><a href="#processing-of-elements" id="markdown-toc-processing-of-elements">Processing of elements</a>    <ul>
-      <li><a href="#serialization-and-communication" id="markdown-toc-serialization-and-communication">Serialization and communication</a></li>
-      <li><a href="#bundling-and-persistence" id="markdown-toc-bundling-and-persistence">Bundling and persistence</a></li>
-    </ul>
-  </li>
-  <li><a href="#parallelism" id="markdown-toc-parallelism">Failures and parallelism within and between transforms</a>    <ul>
-      <li><a href="#data-parallelism" id="markdown-toc-data-parallelism">Data-parallelism within one transform</a></li>
-      <li><a href="#dependent-parallellism" id="markdown-toc-dependent-parallellism">Dependent-parallelism between transforms</a></li>
-      <li><a href="#failures-within-one-transform" id="markdown-toc-failures-within-one-transform">Failures within one transform</a></li>
-      <li><a href="#coupled-failure" id="markdown-toc-coupled-failure">Coupled failure: Failures between transforms</a></li>
-    </ul>
-  </li>
-</ul>
-
 <h2 id="processing-of-elements">Processing of elements</h2>
 
 <p>The serialization and communication of elements between machines is one of the
@@ -298,27 +283,25 @@ in parallel, and how transforms are retried when failures occur.</p>
 <p>When executing a single <code class="highlighter-rouge">ParDo</code>, a runner might divide an example input
 collection of nine elements into two bundles as shown in figure 1.</p>
 
-<p><img src="/images/execution_model_bundling.svg" alt="bundling" /></p>
+<p><img src="/images/execution_model_bundling.svg" alt="Bundle A contains five elements. Bundle B contains four elements." /></p>
 
-<p><strong>Figure 1:</strong> A runner divides an input collection with nine elements
-into two bundles.</p>
+<p><em>Figure 1: A runner divides an input collection into two bundles.</em></p>
 
 <p>When the <code class="highlighter-rouge">ParDo</code> executes, workers may process the two bundles in parallel as
 shown in figure 2.</p>
 
-<p><img src="/images/execution_model_bundling_gantt.svg" alt="bundling_gantt" /></p>
+<p><img src="/images/execution_model_bundling_gantt.svg" alt="Two workers process the two bundles in parallel. Worker one processes bundle
+  A. Worker two processes bundle B." /></p>
 
-<p><strong>Figure 2:</strong> Two workers process the two bundles in parallel. The elements in
-each bundle are processed in sequence.</p>
+<p><em>Figure 2: Two workers process the two bundles in parallel.</em></p>
 
 <p>Since elements cannot be split, the maximum parallelism for a transform depends
-on the number of elements in the collection. In our example, the input
-collection has nine elements, so the maximum parallelism is nine.</p>
+on the number of elements in the collection. In figure 3, the input collection
+has nine elements, so the maximum parallelism is nine.</p>
 
-<p><img src="/images/execution_model_bundling_gantt_max.svg" alt="bundling_gantt_max" /></p>
+<p><img src="/images/execution_model_bundling_gantt_max.svg" alt="Nine workers process a nine element input collection in parallel." /></p>
 
-<p><strong>Figure 3:</strong> The maximum parallelism is nine, as there are nine elements in the
-input collection.</p>
+<p><em>Figure 3: Nine workers process a nine element input collection in parallel.</em></p>
 
 <p>Note: Splittable ParDo allows splitting the processing of a single input across
 multiple bundles. This feature is a work in progress.</p>
@@ -331,9 +314,10 @@ output elements without altering the bundling. In figure 4, <code class="highlig
 <code class="highlighter-rouge">ParDo2</code> are <em>dependently parallel</em> if the output of <code class="highlighter-rouge">ParDo1</code> for a given
 element must be processed on the same worker.</p>
 
-<p><img src="/images/execution_model_bundling_multi.svg" alt="bundling_multi" /></p>
+<p><img src="/images/execution_model_bundling_multi.svg" alt="ParDo1 processes an input collection that contains bundles A and B. ParDo2 then
+  processes the output collection from ParDo1, which contains bundles C and D." /></p>
 
-<p><strong>Figure 4:</strong> Two transforms in sequence and their corresponding input collections.</p>
+<p><em>Figure 4: Two transforms in sequence and their corresponding input collections.</em></p>
 
 <p>Figure 5 shows how these dependently parallel transforms might execute. The
 first worker executes <code class="highlighter-rouge">ParDo1</code> on the elements in bundle A (which results in
@@ -341,9 +325,10 @@ bundle C), and then executes <code class="highlighter-rouge">ParDo2</code> on th
 the second worker executes <code class="highlighter-rouge">ParDo1</code> on the elements in bundle B (which results
 in bundle D), and then executes <code class="highlighter-rouge">ParDo2</code> on the elements in bundle D.</p>
 
-<p><img src="/images/execution_model_bundling_multi_gantt.svg" alt="bundling_multi_gantt.svg" /></p>
+<p><img src="/images/execution_model_bundling_multi_gantt.svg" alt="Worker one executes ParDo1 on bundle A and Pardo2 on bundle C. Worker two
+  executes ParDo1 on bundle B and ParDo2 on bundle D." /></p>
 
-<p><strong>Figure 5:</strong> Two workers execute dependently parallel ParDo transforms.</p>
+<p><em>Figure 5: Two workers execute dependently parallel ParDo transforms.</em></p>
 
 <p>Executing transforms this way allows a runner to avoid redistributing elements
 between workers, which saves on communication costs. However, the maximum parallelism
@@ -367,12 +352,13 @@ there is one element still awaiting processing.</p>
 <p>We see that the runner retries all elements in bundle B and the processing
 completes successfully the second time. Note that the retry does not necessarily
 happen on the same worker as the original processing attempt, as shown in the
-diagram.</p>
+figure.</p>
 
-<p><img src="/images/execution_model_failure_retry.svg" alt="failure_retry" /></p>
+<p><img src="/images/execution_model_failure_retry.svg" alt="Worker two fails to process an element in bundle B. Worker one finishes
+  processing bundle A and then successfully retries to execute bundle B." /></p>
 
-<p><strong>Figure 6:</strong> The processing of an element within bundle B fails, and another worker
-retries the entire bundle.</p>
+<p><em>Figure 6: The processing of an element within bundle B fails, and another worker
+retries the entire bundle.</em></p>
 
 <p>Because we encountered a failure while processing an element in the input
 bundle, we had to reprocess <em>all</em> of the elements in the input bundle. This means
@@ -396,10 +382,11 @@ the output of <code class="highlighter-rouge">ParDo2</code>. Because the runner
 together, the output bundle from <code class="highlighter-rouge">ParDo1</code> must also be thrown away, and all
 elements in the input bundle must be retried. These two <code class="highlighter-rouge">ParDo</code>s are co-failing.</p>
 
-<p><img src="/images/execution_model_bundling_coupled_failure.svg" alt="bundling_coupled failure" /></p>
+<p><img src="/images/execution_model_bundling_coupled_failure.svg" alt="Worker two fails to process en element in bundle D, so all elements in both
+  bundle B and bundle D must be retried." /></p>
 
-<p><strong>Figure 7:</strong> Processing of an element within bundle D fails, so all elements in
-the input bundle are retried.</p>
+<p><em>Figure 7: Processing of an element within bundle D fails, so all elements in
+the input bundle are retried.</em></p>
 
 <p>Note that the retry does not necessarily have the same processing time as the
 original attempt, as shown in the diagram.</p>
diff --git a/content/documentation/pipelines/design-your-pipeline/index.html b/content/documentation/pipelines/design-your-pipeline/index.html
index 7b74eec..29a020b 100644
--- a/content/documentation/pipelines/design-your-pipeline/index.html
+++ b/content/documentation/pipelines/design-your-pipeline/index.html
@@ -238,12 +238,13 @@
 
 <h2 id="a-basic-pipeline">A basic pipeline</h2>
 
-<p>The simplest pipelines represent a linear flow of operations, as shown in Figure 1 below:</p>
+<p>The simplest pipelines represent a linear flow of operations, as shown in figure
+1.</p>
 
-<figure id="fig1">
-    <img src="/images/design-your-pipeline-linear.png" alt="A linear pipeline." />
-</figure>
-<p>Figure 1: A linear pipeline.</p>
+<p><img src="/images/design-your-pipeline-linear.png" alt="A linear pipeline starts with one input collection, sequentially applies
+  three transforms, and ends with one output collection." /></p>
+
+<p><em>Figure 1: A linear pipeline.</em></p>
 
 <p>However, your pipeline can be significantly more complex. A pipeline represents a <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">Directed Acyclic Graph</a> of steps. It can have multiple input sources, multiple output sinks, and its operations (<code class="highlighter-rouge">PTransform</code>s) can both read and output multiple <code class="highlighter-rouge">PCollection</code>s. The following examples show some of the different shapes your pipeline can take.</p>
 
@@ -255,12 +256,16 @@
 
 <p>You can use the same <code class="highlighter-rouge">PCollection</code> as input for multiple transforms without consuming the input or altering it.</p>
 
-<p>The pipeline illustrated in Figure 2 below reads its input, first names (Strings), from a single source, a database table, and creates a <code class="highlighter-rouge">PCollection</code> of table rows. Then, the pipeline applies multiple transforms to the <strong>same</strong> <code class="highlighter-rouge">PCollection</code>. Transform A extracts all the names in that <code class="highlighter-rouge">PCollection</code> that start with the letter ‘A’, and Transform B extracts all the [...]
+<p>The pipeline in figure 2 is a branching pipeline. The pipeline reads its input (first names represented as strings) from a database table and creates a <code class="highlighter-rouge">PCollection</code> of table rows. Then, the pipeline applies multiple transforms to the <strong>same</strong> <code class="highlighter-rouge">PCollection</code>. Transform A extracts all the names in that <code class="highlighter-rouge">PCollection</code> that start with the letter ‘A’, and Transform B e [...]
+
+<p><img src="/images/design-your-pipeline-multiple-pcollections.png" alt="The pipeline applies two transforms to a single input collection. Each
+  transform produces an output collection." /></p>
+
+<p><em>Figure 2: A branching pipeline. Two transforms are applied to a single
+PCollection of database table rows.</em></p>
+
+<p>The following example code applies two transforms to a single input collection.</p>
 
-<figure id="fig2">
-    <img src="/images/design-your-pipeline-multiple-pcollections.png" alt="A pipeline with multiple transforms. Note that the PCollection of table rows is processed by two transforms." />
-</figure>
-<p>Figure 2: A pipeline with multiple transforms. Note that the PCollection of the database table rows is processed by two transforms. See the example code below:</p>
 <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">dbRowCollection</span> <span class="o">=</span> <span class="o">...;</span>
 
 <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">aCollection</span> <span class="o">=</span> <span class="n">dbRowCollection</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="s">"aTrans"</span><span class="o">,</span> <span class="n">ParDo</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="n [...]
@@ -287,14 +292,16 @@
 
 <p>Another way to branch a pipeline is to have a <strong>single</strong> transform output to multiple <code class="highlighter-rouge">PCollection</code>s by using <a href="/documentation/programming-guide/#additional-outputs">tagged outputs</a>. Transforms that produce more than one output process each element of the input once, and output to zero or more <code class="highlighter-rouge">PCollection</code>s.</p>
 
-<p>Figure 3 below illustrates the same example described above, but with one transform that produces multiple outputs. Names that start with ‘A’ are added to the main output <code class="highlighter-rouge">PCollection</code>, and names that start with ‘B’ are added to an additional output <code class="highlighter-rouge">PCollection</code>.</p>
+<p>Figure 3 illustrates the same example described above, but with one transform that produces multiple outputs. Names that start with ‘A’ are added to the main output <code class="highlighter-rouge">PCollection</code>, and names that start with ‘B’ are added to an additional output <code class="highlighter-rouge">PCollection</code>.</p>
 
-<figure id="fig3">
-    <img src="/images/design-your-pipeline-additional-outputs.png" alt="A pipeline with a transform that outputs multiple PCollections." />
-</figure>
-<p>Figure 3: A pipeline with a transform that outputs multiple PCollections.</p>
+<p><img src="/images/design-your-pipeline-additional-outputs.png" alt="The pipeline applies one transform that produces multiple output collections." /></p>
 
-<p>The pipeline in Figure 2 contains two transforms that process the elements in the same input <code class="highlighter-rouge">PCollection</code>. One transform uses the following logic:</p>
+<p><em>Figure 3: A pipeline with a transform that outputs multiple PCollections.</em></p>
+
+<p>If we compare the pipelines in figure 2 and figure 3, you can see they perform
+the same operation in different ways. The pipeline in figure 2 contains two
+transforms that process the elements in the same input <code class="highlighter-rouge">PCollection</code>. One
+transform uses the following logic:</p>
 
 <pre>if (starts with 'A') { outputToPCollectionA }</pre>
 
@@ -304,11 +311,15 @@
 
 <p>Because each transform reads the entire input <code class="highlighter-rouge">PCollection</code>, each element in the input <code class="highlighter-rouge">PCollection</code> is processed twice.</p>
 
-<p>The pipeline in Figure 3 performs the same operation in a different way - with only one transform that uses the following logic:</p>
+<p>The pipeline in figure 3 performs the same operation in a different way - with only one transform that uses the following logic:</p>
 
 <pre>if (starts with 'A') { outputToPCollectionA } else if (starts with 'B') { outputToPCollectionB }</pre>
 
-<p>where each element in the input <code class="highlighter-rouge">PCollection</code> is processed once. See the example code below:</p>
+<p>where each element in the input <code class="highlighter-rouge">PCollection</code> is processed once.</p>
+
+<p>The following example code applies one transform that processes each element
+once and outputs two collections.</p>
+
 <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// Define two TupleTags, one for each output.</span>
 <span class="kd">final</span> <span class="n">TupleTag</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">startsWithATag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TupleTag</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;(){};</span>
 <span class="kd">final</span> <span class="n">TupleTag</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">startsWithBTag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TupleTag</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;(){};</span>
@@ -352,30 +363,41 @@
   <li><strong>Join</strong> - You can use the <code class="highlighter-rouge">CoGroupByKey</code> transform in the Beam SDK to perform a relational join between two <code class="highlighter-rouge">PCollection</code>s. The <code class="highlighter-rouge">PCollection</code>s must be keyed (i.e. they must be collections of key/value pairs) and they must use the same key type.</li>
 </ul>
 
-<p>The example depicted in Figure 4 below is a continuation of the example illustrated in Figure 2 in <a href="#multiple-transforms-process-the-same-pcollection">the section above</a>. After branching into two <code class="highlighter-rouge">PCollection</code>s, one with names that begin with ‘A’ and one with names that begin with ‘B’, the pipeline merges the two together into a single <code class="highlighter-rouge">PCollection</code> that now contains all names that begin with either ‘ [...]
+<p>The example in figure 4 is a continuation of the example in figure 2 in <a href="#multiple-transforms-process-the-same-pcollection">the
+section above</a>. After
+branching into two <code class="highlighter-rouge">PCollection</code>s, one with names that begin with ‘A’ and one
+with names that begin with ‘B’, the pipeline merges the two together into a
+single <code class="highlighter-rouge">PCollection</code> that now contains all names that begin with either ‘A’ or
+‘B’. Here, it makes sense to use <code class="highlighter-rouge">Flatten</code> because the <code class="highlighter-rouge">PCollection</code>s being
+merged both contain the same type.</p>
+
+<p><img src="/images/design-your-pipeline-flatten.png" alt="The pipeline merges two collections into one collection with the Flatten transform." /></p>
+
+<p><em>Figure 4: A pipeline that merges two collections into one collection with the Flatten
+transform.</em></p>
+
+<p>The following example code applies <code class="highlighter-rouge">Flatten</code> to merge two collections.</p>
 
-<figure id="fig4">
-    <img src="/images/design-your-pipeline-flatten.png" alt="Part of a pipeline that merges multiple PCollections." />
-</figure>
-<p>Figure 4: Part of a pipeline that merges multiple PCollections. See the example code below:</p>
 <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">//merge the two PCollections with Flatten</span>
 <span class="n">PCollectionList</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">collectionList</span> <span class="o">=</span> <span class="n">PCollectionList</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">aCollection</span><span class="o">).</span><span class="na">and</span><span class="o">(</span><span class="n">bCollection</span><span class="o">);</span>
 <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">mergedCollectionWithFlatten</span> <span class="o">=</span> <span class="n">collectionList</span>
     <span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">Flatten</span><span class="o">.&lt;</span><span class="n">String</span><span class="o">&gt;</span><span class="n">pCollections</span><span class="o">());</span>
 
-<span class="c1">// continue with the new merged PCollection		</span>
+<span class="c1">// continue with the new merged PCollection</span>
 <span class="n">mergedCollectionWithFlatten</span><span class="o">.</span><span class="na">apply</span><span class="o">(...);</span>
 </code></pre>
 </div>
 
 <h2 id="multiple-sources">Multiple sources</h2>
 
-<p>Your pipeline can read its input from one or more sources. If your pipeline reads from multiple sources and the data from those sources is related, it can be useful to join the inputs together. In the example illustrated in Figure 5 below, the pipeline reads names and addresses from a database table, and names and order numbers from a Kafka topic. The pipeline then uses <code class="highlighter-rouge">CoGroupByKey</code> to join this information, where the key is the name; the resulti [...]
+<p>Your pipeline can read its input from one or more sources. If your pipeline reads from multiple sources and the data from those sources is related, it can be useful to join the inputs together. In the example illustrated in figure 5 below, the pipeline reads names and addresses from a database table, and names and order numbers from a Kafka topic. The pipeline then uses <code class="highlighter-rouge">CoGroupByKey</code> to join this information, where the key is the name; the resulti [...]
+
+<p><img src="/images/design-your-pipeline-join.png" alt="The pipeline joins two input collections into one collection with the Join transform." /></p>
+
+<p><em>Figure 5: A pipeline that does a relational join of two input collections.</em></p>
+
+<p>The following example code applies <code class="highlighter-rouge">Join</code> to join two input collections.</p>
 
-<figure id="fig5">
-    <img src="/images/design-your-pipeline-join.png" alt="A pipeline with multiple input sources." />
-</figure>
-<p>Figure 5: A pipeline with multiple input sources. See the example code below:</p>
 <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">PCollection</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;</span> <span class="n">userAddress</span> <span class="o">=</span> <span class="n">pipeline</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">JdbcIO [...]
 
 <span class="n">PCollection</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;</span> <span class="n">userOrder</span> <span class="o">=</span> <span class="n">pipeline</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">KafkaIO</span><span class="o">.&lt;</span><span class="n">String</span><span class [...]
diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html
index 1803da2..4af91c1 100644
--- a/content/documentation/programming-guide/index.html
+++ b/content/documentation/programming-guide/index.html
@@ -896,9 +896,12 @@ you can chain transforms to create a sequential pipeline, like this one:</p>
 </code></pre>
 </div>
 
-<p>The resulting workflow graph of the above pipeline looks like this:</p>
+<p>The resulting workflow graph of the above pipeline looks like this.</p>
 
-<p>[Sequential Graph Graphic]</p>
+<p><img src="/images/design-your-pipeline-linear.png" alt="This linear pipeline starts with one input collection, sequentially applies
+  three transforms, and ends with one output collection." /></p>
+
+<p><em>Figure: A linear pipeline with three sequential transforms.</em></p>
 
 <p>However, note that a transform <em>does not consume or otherwise alter</em> the input
 collection–remember that a <code class="highlighter-rouge">PCollection</code> is immutable by definition. This means
@@ -914,9 +917,13 @@ a branching pipeline, like so:</p>
 </code></pre>
 </div>
 
-<p>The resulting workflow graph from the branching pipeline above looks like this:</p>
+<p>The resulting workflow graph from the branching pipeline above looks like this.</p>
+
+<p><img src="/images/design-your-pipeline-multiple-pcollections.png" alt="This pipeline applies two transforms to a single input collection. Each
+  transform produces an output collection." /></p>
 
-<p>[Branching Graph Graphic]</p>
+<p><em>Figure: A branching pipeline. Two transforms are applied to a single
+PCollection of database table rows.</em></p>
 
 <p>You can also build your own <a href="#composite-transforms">composite transforms</a> that
 nest multiple sub-steps inside a single, larger transform. Composite transforms
diff --git a/content/get-started/beam-overview/index.html b/content/get-started/beam-overview/index.html
index 8e89289..2a35b84 100644
--- a/content/get-started/beam-overview/index.html
+++ b/content/get-started/beam-overview/index.html
@@ -137,8 +137,8 @@
 <p>Beam currently supports the following language-specific SDKs:</p>
 
 <ul>
-  <li>Java <img src="/images/logos/sdks/java.png" alt="Java SDK" /></li>
-  <li>Python <img src="/images/logos/sdks/python.png" alt="Python SDK " /></li>
+  <li>Java <img src="/images/logos/sdks/java.png" alt="Java logo" /></li>
+  <li>Python <img src="/images/logos/sdks/python.png" alt="Python logo" /></li>
 </ul>
 
 <h2 id="apache-beam-pipeline-runners">Apache Beam Pipeline Runners</h2>
@@ -148,11 +148,11 @@
 <p>Beam currently supports Runners that work with the following distributed processing back-ends:</p>
 
 <ul>
-  <li>Apache Apex <img src="/images/logos/runners/apex.png" alt="Apache Apex" /></li>
-  <li>Apache Flink <img src="/images/logos/runners/flink.png" alt="Apache Flink" /></li>
-  <li>Apache Gearpump (incubating) <img src="/images/logos/runners/gearpump.png" alt="Apache Gearpump" /></li>
-  <li>Apache Spark <img src="/images/logos/runners/spark.png" alt="Apache Spark" /></li>
-  <li>Google Cloud Dataflow <img src="/images/logos/runners/dataflow.png" alt="Google Cloud Dataflow" /></li>
+  <li>Apache Apex  <img src="/images/logos/runners/apex.png" alt="Apache Apex logo" /></li>
+  <li>Apache Flink <img src="/images/logos/runners/flink.png" alt="Apache Flink logo" /></li>
+  <li>Apache Gearpump (incubating) <img src="/images/logos/runners/gearpump.png" alt="Apache Gearpump logo" /></li>
+  <li>Apache Spark <img src="/images/logos/runners/spark.png" alt="Apache Spark logo" /></li>
+  <li>Google Cloud Dataflow <img src="/images/logos/runners/dataflow.png" alt="Google Cloud Dataflow logo" /></li>
 </ul>
 
 <p><strong>Note:</strong> You can always execute your pipeline locally for testing and debugging purposes.</p>
diff --git a/content/get-started/mobile-gaming-example/index.html b/content/get-started/mobile-gaming-example/index.html
index f6483b9..91c260f 100644
--- a/content/get-started/mobile-gaming-example/index.html
+++ b/content/get-started/mobile-gaming-example/index.html
@@ -209,10 +209,13 @@
 
 <p>The following diagram shows the ideal situation (events are processed as they occur) vs. reality (there is often a time delay before processing).</p>
 
-<figure id="fig1">
-    <img src="/images/gaming-example-basic.png" width="264" height="260" alt="Score data for three users." />
-</figure>
-<p><strong>Figure 1:</strong> The X-axis represents event time: the actual time a game event occurred. The Y-axis represents processing time: the time at which a game event was processed. Ideally, events should be processed as they occur, depicted by the dotted line in the diagram. However, in reality that is not the case and it looks more like what is depicted by the red squiggly line.</p>
+<p><img src="/images/gaming-example-basic.png" alt="There is often a time delay before processing events." /></p>
+
+<p><em>Figure 1: The X-axis represents event time: the actual time a game event
+occurred. The Y-axis represents processing time: the time at which a game event
+was processed. Ideally, events should be processed as they occur, depicted by
+the dotted line in the diagram. However, in reality that is not the case and it
+looks more like what is depicted by the red squiggly line above the ideal line.</em></p>
 
 <p>The data events might be received by the game server significantly later than users generate them. This time difference (called <strong>skew</strong>) can have processing implications for pipelines that make calculations that consider when each score was generated. Such pipelines might track scores generated during each hour of a day, for example, or they calculate the length of time that users are continuously playing the game—both of which depend on each data record’s event time.</p>
 
@@ -254,12 +257,11 @@
   <li>Write the result data to a text file.</li>
 </ol>
 
-<p>The following diagram shows score data for several users over the pipeline analysis period. In the diagram, each data point is an event that results in one user/score pair:</p>
+<p>The following diagram shows score data for several users over the pipeline analysis period. In the diagram, each data point is an event that results in one user/score pair.</p>
+
+<p><img src="/images/gaming-example.gif" alt="A pipeline processes score data for three users." width="850px" /></p>
 
-<figure id="fig2">
-    <img src="/images/gaming-example.gif" width="900" height="263" alt="Score data for three users." />
-</figure>
-<p><strong>Figure 2:</strong> Score data for three users.</p>
+<p><em>Figure 2: Score data for three users.</em></p>
 
 <p>This example uses batch processing, and the diagram’s Y axis represents processing time: the pipeline processes events lower on the Y-axis first, and events higher up the axis later. The diagram’s X axis represents the event time for each game event, as denoted by that event’s timestamp. Note that the individual events in the diagram are not processed by the pipeline in the same order as they occurred (according to their timestamps).</p>
 
@@ -422,10 +424,10 @@
 
 <p>The following diagram shows how the pipeline processes a day’s worth of a single team’s scoring data after applying fixed-time windowing:</p>
 
-<figure id="fig3">
-    <img src="/images/gaming-example-team-scores-narrow.gif" width="900" height="390" alt="Score data for two teams." />
-</figure>
-<p><strong>Figure 3:</strong> Score data for two teams. Each team’s scores are divided into logical windows based on when those scores occurred in event time.</p>
+<p><img src="/images/gaming-example-team-scores-narrow.gif" alt="A pipeline processes score data for two teams." width="800px" /></p>
+
+<p><em>Figure 3: Score data for two teams. Each team’s scores are divided into
+logical windows based on when those scores occurred in event time.</em></p>
 
 <p>Notice that as processing time advances, the sums are now <em>per window</em>; each window represents an hour of <em>event time</em> during the day in which the scores occurred.</p>
 
@@ -693,10 +695,11 @@
 
 <p>When we specify a ten-minute processing time trigger for the single global window, the pipeline effectively takes a “snapshot” of the contents of the window every time the trigger fires. This snapshot happens after ten minutes have passed since data was received. If no data has arrived, the pipeline takes its next “snapshot” 10 minutes after an element arrives. Since we’re using a single global window, each snapshot contains all the data collected <em>to that point in time</em>. The f [...]
 
-<figure id="fig4">
-    <img src="/images/gaming-example-proc-time-narrow.gif" width="900" height="263" alt="Score data for three users." />
-</figure>
-<p><strong>Figure 4:</strong> Score data for three users. Each user’s scores are grouped together in a single global window, with a trigger that generates a snapshot for output ten minutes after data is received.</p>
+<p><img src="/images/gaming-example-proc-time-narrow.gif" alt="A pipeline processes score data for three users." width="850px" /></p>
+
+<p><em>Figure 4: Score data for three users. Each user’s scores are grouped together
+in a single global window, with a trigger that generates a snapshot for output
+ten minutes after data is received.</em></p>
 
 <p>As processing time advances and more scores are processed, the trigger outputs the updated sum for each user.</p>
 
@@ -768,10 +771,11 @@
 
 <p>The following diagram shows the relationship between ongoing processing time and each score’s event time for two teams:</p>
 
-<figure id="fig5">
-    <img src="/images/gaming-example-event-time-narrow.gif" width="900" height="390" alt="Score data by team, windowed by event time." />
-</figure>
-<p><strong>Figure 5:</strong> Score data by team, windowed by event time. A trigger based on processing time causes the window to emit speculative early results and include late results.</p>
+<p><img src="/images/gaming-example-event-time-narrow.gif" alt="A pipeline processes score data by team, windowed by event time." width="800px" /></p>
+
+<p><em>Figure 5: Score data by team, windowed by event time. A trigger based on
+processing time causes the window to emit speculative early results and include
+late results.</em></p>
 
 <p>The dotted line in the diagram is the “ideal” <strong>watermark</strong>: Beam’s notion of when all data in a given window can reasonably be considered to have arrived. The irregular solid line represents the actual watermark, as determined by the data source.</p>
 
@@ -999,10 +1003,11 @@
 
 <p>The following diagram shows how data might look when grouped into session windows. Unlike fixed windows, session windows are <em>different for each user</em> and is dependent on each individual user’s play pattern:</p>
 
-<figure id="fig6">
-    <img src="/images/gaming-example-session-windows.png" width="662" height="521" alt="User sessions, with a minimum gap duration." />
-</figure>
-<p><strong>Figure 6:</strong> User sessions, with a minimum gap duration. Note how each user has different sessions, according to how many instances they play and how long their breaks between instances are.</p>
+<p><img src="/images/gaming-example-session-windows.png" alt="User sessions with a minimum gap duration." /></p>
+
+<p><em>Figure 6: User sessions with a minimum gap duration. Each user has different
+sessions, according to how many instances they play and how long their breaks
+between instances are.</em></p>
 
 <p>We can use the session-windowed data to determine the average length of uninterrupted play time for all of our users, as well as the total score they achieve during each session. We can do this in the code by first applying session windows, summing the score per user and session, and then using a transform to calculate the length of each individual session:</p>
 
diff --git a/content/get-started/wordcount-example/index.html b/content/get-started/wordcount-example/index.html
index 469d901..bbe6fad 100644
--- a/content/get-started/wordcount-example/index.html
+++ b/content/get-started/wordcount-example/index.html
@@ -128,7 +128,7 @@
       <li><a href="#using-parameterizable-pipelineoptions">Using parameterizable PipelineOptions</a></li>
     </ul>
   </li>
-  <li><a href="#debugging-wordcount-example">Debugging WordCount example</a>
+  <li><a href="#debuggingwordcount-example">DebuggingWordCount example</a>
     <ul>
       <li><a href="#logging">Logging</a></li>
       <li><a href="#testing-your-pipeline-via-passert">Testing your pipeline via PAssert</a></li>
@@ -164,7 +164,7 @@
       <li><a href="#using-parameterizable-pipelineoptions" id="markdown-toc-using-parameterizable-pipelineoptions">Using parameterizable PipelineOptions</a></li>
     </ul>
   </li>
-  <li><a href="#debugging-wordcount-example" id="markdown-toc-debugging-wordcount-example">Debugging WordCount example</a>    <ul>
+  <li><a href="#debuggingwordcount-example" id="markdown-toc-debuggingwordcount-example">DebuggingWordCount example</a>    <ul>
       <li><a href="#logging" id="markdown-toc-logging">Logging</a>        <ul>
           <li><a href="#direct-runner" id="markdown-toc-direct-runner">Direct Runner</a></li>
           <li><a href="#cloud-dataflow-runner" id="markdown-toc-cloud-dataflow-runner">Cloud Dataflow Runner</a></li>
@@ -201,23 +201,23 @@ four successively more detailed WordCount examples that build on each other. The
 input text for all the examples is a set of Shakespeare’s texts.</p>
 
 <p>Each WordCount example introduces different concepts in the Beam programming
-model. Begin by understanding Minimal WordCount, the simplest of the examples.
+model. Begin by understanding MinimalWordCount, the simplest of the examples.
 Once you feel comfortable with the basic principles in building a pipeline,
 continue on to learn more concepts in the other examples.</p>
 
 <ul>
-  <li><strong>Minimal WordCount</strong> demonstrates the basic principles involved in building a
+  <li><strong>MinimalWordCount</strong> demonstrates the basic principles involved in building a
 pipeline.</li>
   <li><strong>WordCount</strong> introduces some of the more common best practices in creating
 re-usable and maintainable pipelines.</li>
-  <li><strong>Debugging WordCount</strong> introduces logging and debugging practices.</li>
-  <li><strong>Windowed WordCount</strong> demonstrates how you can use Beam’s programming model
+  <li><strong>DebuggingWordCount</strong> introduces logging and debugging practices.</li>
+  <li><strong>WindowedWordCount</strong> demonstrates how you can use Beam’s programming model
 to handle both bounded and unbounded datasets.</li>
 </ul>
 
 <h2 id="minimalwordcount-example">MinimalWordCount example</h2>
 
-<p>Minimal WordCount demonstrates a simple pipeline that can read from a text file,
+<p>MinimalWordCount demonstrates a simple pipeline that can read from a text file,
 apply transforms to tokenize and count the words, and write the data to an
 output text file. This example hard-codes the locations for its input and output
 files and doesn’t perform any error checking; it is intended to only show you
@@ -283,7 +283,7 @@ python -m apache_beam.examples.wordcount_minimal --input gs://dataflow-samples/s
 </ul>
 
 <p>The following sections explain these concepts in detail, using the relevant code
-excerpts from the Minimal WordCount pipeline.</p>
+excerpts from the MinimalWordCount pipeline.</p>
 
 <h3 id="creating-the-pipeline">Creating the pipeline</h3>
 
@@ -339,7 +339,7 @@ executed, associated with that particular pipeline.</p>
 
 <h3 id="applying-pipeline-transforms">Applying pipeline transforms</h3>
 
-<p>The Minimal WordCount pipeline contains several transforms to read data into the
+<p>The MinimalWordCount pipeline contains several transforms to read data into the
 pipeline, manipulate or otherwise transform the data, and write out the results.
 Transforms can consist of an individual operation, or can contain multiple
 nested transforms (which is a <a href="/documentation/programming-guide#composite-transforms">composite transform</a>).</p>
@@ -349,10 +349,11 @@ input and output data is often represented by the SDK class <code class="highlig
 <code class="highlighter-rouge">PCollection</code> is a special class, provided by the Beam SDK, that you can use to
 represent a data set of virtually any size, including unbounded data sets.</p>
 
-<p><img src="/images/wordcount-pipeline.png" alt="Word Count pipeline diagram" />
-Figure 1: The pipeline data flow.</p>
+<p><img src="/images/wordcount-pipeline.png" alt="The MinimalWordCount pipeline data flow." width="800px" /></p>
 
-<p>The Minimal WordCount pipeline contains five transforms:</p>
+<p><em>Figure 1: The MinimalWordCount pipeline data flow.</em></p>
+
+<p>The MinimalWordCount pipeline contains five transforms:</p>
 
 <ol>
   <li>
@@ -486,7 +487,7 @@ your pipeline, and help make your pipeline’s code reusable.</p>
 
 <p>This section assumes that you have a good understanding of the basic concepts in
 building a pipeline. If you feel that you aren’t at that point yet, read the
-above section, <a href="#minimalwordcount-example">Minimal WordCount</a>.</p>
+above section, <a href="#minimalwordcount-example">MinimalWordCount</a>.</p>
 
 <p><strong>To run this example in Java:</strong></p>
 
@@ -580,7 +581,7 @@ pipeline code into smaller sections.</p>
 gets applied to each element in the input <code class="highlighter-rouge">PCollection</code>. This processing
 operation is a subclass of the SDK class <code class="highlighter-rouge">DoFn</code>. You can create the <code class="highlighter-rouge">DoFn</code>
 subclasses for each <code class="highlighter-rouge">ParDo</code> inline, as an anonymous inner class instance, as is
-done in the previous example (Minimal WordCount). However, it’s often a good
+done in the previous example (MinimalWordCount). However, it’s often a good
 idea to define the <code class="highlighter-rouge">DoFn</code> at the global level, which makes it easier to unit
 test and can make the <code class="highlighter-rouge">ParDo</code> code more readable.</p>
 
@@ -706,9 +707,9 @@ values for them. You can then access the options values in your pipeline code.</
 </code></pre>
 </div>
 
-<h2 id="debugging-wordcount-example">Debugging WordCount example</h2>
+<h2 id="debuggingwordcount-example">DebuggingWordCount example</h2>
 
-<p>The Debugging WordCount example demonstrates some best practices for
+<p>The DebuggingWordCount example demonstrates some best practices for
 instrumenting your pipeline code.</p>
 
 <p><strong>To run this example in Java:</strong></p>
@@ -942,7 +943,7 @@ for an example unit test.</p>
 
 <h2 id="windowedwordcount-example">WindowedWordCount example</h2>
 
-<p>This example, <code class="highlighter-rouge">WindowedWordCount</code>, counts words in text just as the previous
+<p>The WindowedWordCount example counts words in text just as the previous
 examples did, but introduces several advanced concepts.</p>
 
 <p><strong>New Concepts:</strong></p>
@@ -1096,7 +1097,7 @@ bounded sets of elements. PTransforms that aggregate multiple elements process
 each <code class="highlighter-rouge">PCollection</code> as a succession of multiple, finite windows, even though the
 entire collection itself may be of infinite size (unbounded).</p>
 
-<p>The <code class="highlighter-rouge">WindowedWordCount</code> example applies fixed-time windowing, wherein each
+<p>The WindowedWordCount example applies fixed-time windowing, wherein each
 window represents a fixed time interval. The fixed window size for this example
 defaults to 1 minute (you can change this with a command-line option).</p>
 

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <commits@beam.apache.org>.

Mime
View raw message