beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From da...@apache.org
Subject [2/3] beam-site git commit: Regenerate website
Date Wed, 28 Dec 2016 01:54:33 GMT
Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/00c736ce
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/00c736ce
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/00c736ce

Branch: refs/heads/asf-site
Commit: 00c736ce6dd742d2673007d68edb6920e9795f9f
Parents: 303fb26
Author: Davor Bonaci <davor@google.com>
Authored: Tue Dec 27 17:54:19 2016 -0800
Committer: Davor Bonaci <davor@google.com>
Committed: Tue Dec 27 17:54:19 2016 -0800

----------------------------------------------------------------------
 content/documentation/index.html                |  14 +++-
 content/documentation/resources/index.html      |   2 +-
 content/documentation/sdks/java/index.html      |  32 +++++++-
 content/documentation/sdks/python/index.html    |   2 -
 .../mobile-gaming-example/index.html            |  37 +++++----
 .../get-started/wordcount-example/index.html    |  77 ++++++-------------
 content/images/gaming-example-basic.png         | Bin 0 -> 63121 bytes
 7 files changed, 88 insertions(+), 76 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/documentation/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/index.html b/content/documentation/index.html
index 1259a66..4767f70 100644
--- a/content/documentation/index.html
+++ b/content/documentation/index.html
@@ -146,7 +146,7 @@
       <div class="row">
         <h1 id="apache-beam-documentation">Apache Beam Documentation</h1>
 
-<p>Get in-depth conceptual information and reference material for the Beam Model, SDKs
and Runners:</p>
+<p>This section provides in-depth conceptual information and reference material for
the Beam Model, SDKs, and Runners:</p>
 
 <h2 id="concepts">Concepts</h2>
 
@@ -157,6 +157,14 @@
   <li>Visit <a href="/documentation/resources/">Additional Resources</a>
for some of our favorite articles and talks about Beam.</li>
 </ul>
 
+<h2 id="pipeline-fundamentals">Pipeline Fundamentals</h2>
+
+<ul>
+  <li><a href="/documentation/pipelines/design-your-pipeline/">Design Your Pipeline</a>
by planning your pipeline’s structure, choosing transforms to apply to your data, and determining
your input and output methods.</li>
+  <li><a href="/documentation/pipelines/create-your-pipeline/">Create Your Pipeline</a>
using the classes in the Beam SDKs.</li>
+  <li><a href="/documentation/pipelines/test-your-pipeline/">Test Your Pipeline</a>
to minimize debugging a pipeline’s remote execution.</li>
+</ul>
+
 <h2 id="sdks">SDKs</h2>
 
 <p>Find status and reference information on all of the available Beam SDKs.</p>
@@ -183,9 +191,9 @@
 
 <h3 id="choosing-a-runner">Choosing a Runner</h3>
 
-<p>Beam is designed to enable pipelines to be portable across different runners. However,
given every runner has different capabilities, they also have different abilities to implement
the core concepts in the Beam model. The <a href="/documentation/runners/capability-matrix">Capability
Matrix</a> provides a detailed comparison of runner functionality.</p>
+<p>Beam is designed to enable pipelines to be portable across different runners. However,
given every runner has different capabilities, they also have different abilities to implement
the core concepts in the Beam model. The <a href="/documentation/runners/capability-matrix/">Capability
Matrix</a> provides a detailed comparison of runner functionality.</p>
 
-<p>Once you have chosen which runner to use, see that runner’s page for more information
about any initial runner-specific setup as well as any required or optional <code class="highlighter-rouge">PipelineOptions</code>
for configuring it’s execution. You may also want to refer back to the <a href="/get-started/quickstart">Quickstart</a>
for instructions on executing the sample WordCount pipeline.</p>
+<p>Once you have chosen which runner to use, see that runner’s page for more information
about any initial runner-specific setup as well as any required or optional <code class="highlighter-rouge">PipelineOptions</code>
for configuring it’s execution. You may also want to refer back to the <a href="/get-started/quickstart/">Quickstart</a>
for instructions on executing the sample WordCount pipeline.</p>
 
       </div>
 

http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/documentation/resources/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/resources/index.html b/content/documentation/resources/index.html
index 21a707f..9aceea2 100644
--- a/content/documentation/resources/index.html
+++ b/content/documentation/resources/index.html
@@ -187,7 +187,7 @@
 
 <p>Hadoop Summit, San Jose, CA, 2016</p>
 
-<p>Presented by Davor Bonacci, <em>Apache Beam PPMC member</em></p>
+<p>Presented by Davor Bonaci, <em>Apache Beam PPMC member</em></p>
 
 <iframe width="560" height="315" src="https://www.youtube.com/embed/7DZ8ONmeP5A" frameborder="0"
allowfullscreen=""></iframe>
 <p><br /></p>

http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/documentation/sdks/java/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/sdks/java/index.html b/content/documentation/sdks/java/index.html
index 3b3ead4..bd29d00 100644
--- a/content/documentation/sdks/java/index.html
+++ b/content/documentation/sdks/java/index.html
@@ -146,11 +146,37 @@
       <div class="row">
         <h1 id="apache-beam-java-sdk">Apache Beam Java SDK</h1>
 
-<p>This page is under construction (<a href="https://issues.apache.org/jira/browse/BEAM-504">BEAM-504</a>).</p>
+<p>The Java SDK for Apache Beam provides a simple, powerful API for building both batch
and streaming parallel data processing pipelines in Java.</p>
 
-<p>Get started with the <a href="/learn/programming-guide">Beam Programming Guide</a>
to learn the basic concepts that hold for all SDKs in the Beam Model.</p>
+<h2 id="get-started-with-the-java-sdk">Get Started with the Java SDK</h2>
+
+<p>Get started with the <a href="/learn/programming-guide/">Beam Programming
Model</a> to learn the basic concepts that apply to all SDKs in Beam.</p>
+
+<p>See the <a href="/learn/sdks/javadoc/">Java API Reference</a> for more
information on individual APIs.</p>
+
+<h2 id="supported-features">Supported Features</h2>
+
+<p>The Java SDK supports all features currently supported by the Beam model.</p>
+
+<h2 id="supported-io-connectors">Supported IO Connectors</h2>
+
+<ul>
+  <li>Amazon Kinesis</li>
+  <li>Apache Hadoop’s <code class="highlighter-rouge">FileInputFormat</code>
in Hadoop Distributed File System (HDFS)</li>
+  <li>Apache Kafka</li>
+  <li>Avro Files</li>
+  <li>Google BigQuery</li>
+  <li>Google Cloud Bigtable</li>
+  <li>Google Cloud Datastore</li>
+  <li>Google Cloud Pub/Sub</li>
+  <li>Google Cloud Storage</li>
+  <li>Java Database Connectivity (JDBC)</li>
+  <li>Java Message Service (JMS)</li>
+  <li>MongoDB</li>
+  <li>Text Files</li>
+  <li>XML Files</li>
+</ul>
 
-<p>See the <a href="/learn/sdks/javadoc/">Java API Reference</a>.</p>
 
       </div>
 

http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/documentation/sdks/python/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/sdks/python/index.html b/content/documentation/sdks/python/index.html
index 5c39a0d..5816dec 100644
--- a/content/documentation/sdks/python/index.html
+++ b/content/documentation/sdks/python/index.html
@@ -146,8 +146,6 @@
       <div class="row">
         <h1 id="apache-beam-python-sdk-under-development">Apache Beam Python SDK <em>[Under
Development]</em></h1>
 
-<p>This page is under construction (<a href="https://issues.apache.org/jira/browse/BEAM-977">BEAM-977</a>).</p>
-
 <p>The Beam Python SDK is currently under development on a feature branch. Would you
like to contribute? See the Beam <a href="/contribute/work-in-progress/#feature-branches">Work
in Progress</a> page to find out how you can help!</p>
 
 <p>Get started with the <a href="/learn/programming-guide">Beam Programming Guide</a>
to learn the basic concepts that hold for all SDKs in the Beam Model.</p>

http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/get-started/mobile-gaming-example/index.html
----------------------------------------------------------------------
diff --git a/content/get-started/mobile-gaming-example/index.html b/content/get-started/mobile-gaming-example/index.html
index ead7405..839beb6 100644
--- a/content/get-started/mobile-gaming-example/index.html
+++ b/content/get-started/mobile-gaming-example/index.html
@@ -196,9 +196,16 @@
   <li>A timestamp that records when the particular instance of play happened–this
is the event time for each game data event.</li>
 </ul>
 
-<p>When the user completes an instance of the game, their phone sends the data event
to a game server, where the data is logged and stored in a file. Generally the data is sent
to the game server immediately upon completion. However, users can play the game “offline”,
when their phones are out of contact with the server (such as on an airplane, or outside network
coverage area). When the user’s phone comes back into contact with the game server, the
phone will send all accumulated game data.</p>
+<p>When the user completes an instance of the game, their phone sends the data event
to a game server, where the data is logged and stored in a file. Generally the data is sent
to the game server immediately upon completion. However, sometimes delays happen in the network
or users play the game “offline”, when their phones are out of contact with the server
(such as on an airplane, or outside network coverage area). When the user’s phone comes
back into contact with the game server, the phone will send all accumulated game data. This
means that some data events may arrive delayed and out of order.</p>
 
-<p>This means that some data events might be received by the game server significantly
later than users generate them. This time difference can have processing implications for
pipelines that make calculations that consider when each score was generated. Such pipelines
might track scores generated during each hour of a day, for example, or they calculate the
length of time that users are continuously playing the game—both of which depend on each
data record’s event time.</p>
+<p>The following diagram shows the ideal situation vs reality. The X-axis represents
event time: the actual time a game event occurred. The Y-axis represents processing time:
the time at which a game event was processed. Ideally, events should be processed as they
occur, depicted by the dotted line in the diagram. However, in reality that is not the case
and reality looks more like what is depicted by the red squiggly line.</p>
+
+<figure id="fig1">
+    <img src="/images/gaming-example-basic.png" width="264" height="260" alt="Score data
for three users." />
+</figure>
+<p>Figure 1: Ideally, events are processed when they occur, with no delays.</p>
+
+<p>The data events might be received by the game server significantly later than users
generate them. This time difference (called <strong>skew</strong>) can have processing
implications for pipelines that make calculations that consider when each score was generated.
Such pipelines might track scores generated during each hour of a day, for example, or they
calculate the length of time that users are continuously playing the game—both of which
depend on each data record’s event time.</p>
 
 <p>Because some of our example pipelines use data files (like logs from the game server)
as input, the event timestamp for each game might be embedded in the data–that is, it’s
a field in each data record. Those pipelines need to parse the event timestamp from each data
record after reading it from the input file.</p>
 
@@ -234,12 +241,12 @@
   <li>Write the result data to a <a href="https://cloud.google.com/bigquery/">Google
Cloud BigQuery</a> table.</li>
 </ol>
 
-<p>The following diagram shows score data for a several users over the pipeline analysis
period. In the diagram, each data point is an event that results in one user/score pair:</p>
+<p>The following diagram shows score data for several users over the pipeline analysis
period. In the diagram, each data point is an event that results in one user/score pair:</p>
 
-<figure id="fig1">
+<figure id="fig2">
     <img src="/images/gaming-example.gif" width="900" height="263" alt="Score data for
three users." />
 </figure>
-<p>Figure 1: Score data for three users.</p>
+<p>Figure 2: Score data for three users.</p>
 
 <p>This example uses batch processing, and the diagram’s Y axis represents processing
time: the pipeline processes events lower on the Y-axis first, and events higher up the axis
later. The diagram’s X axis represents the event time for each game event, as denoted by
that event’s timestamp. Note that the individual events in the diagram are not processed
by the pipeline in the same order as they occurred (according to their timestamps).</p>
 
@@ -355,10 +362,10 @@
 
 <p>The following diagram shows how the pipeline processes a day’s worth of a single
team’s scoring data after applying fixed-time windowing:</p>
 
-<figure id="fig2">
+<figure id="fig3">
     <img src="/images/gaming-example-team-scores-narrow.gif" width="900" height="390"
alt="Score data for two teams." />
 </figure>
-<p>Figure 2: Score data for two teams. Each team’s scores are divided into logical
windows based on when those scores occurred in event time.</p>
+<p>Figure 3: Score data for two teams. Each team’s scores are divided into logical
windows based on when those scores occurred in event time.</p>
 
 <p>Notice that as processing time advances, the sums are now <em>per window</em>;
each window represents an hour of <em>event time</em> during the day in which
the scores occurred.</p>
 
@@ -493,12 +500,12 @@
 
 <p>Because we want all the data that has arrived in the pipeline every time we update
our calculation, we have the pipeline consider all of the user score data in a <strong>single
global window</strong>. The single global window is unbounded, but we can specify a
kind of temporary cut-off point for each ten-minute calculation by using a processing time
<a href="/documentation/programming-guide/#triggers">trigger</a>.</p>
 
-<p>When we specify a ten-minute processing time trigger for the single global window,
the effect is that the pipeline effectively takes a “snapshot” of the contents of the
window every time the trigger fires (which it does at ten-minute intervals). Since we’re
using a single global window, each snapshot contains all the data collected <em>to that
point in time</em>. The following diagram shows the effects of using a processing time
trigger on the single global window:</p>
+<p>When we specify a ten-minute processing time trigger for the single global window,
the pipeline effectively takes a “snapshot” of the contents of the window every time the
trigger fires. This snapshot happens at ten-minute intervals as long as data has arrived.
If no data has arrived, the pipeline will take its next “snapshot” 10 minutes past an
element arriving. Since we’re using a single global window, each snapshot contains all the
data collected <em>to that point in time</em>. The following diagram shows the
effects of using a processing time trigger on the single global window:</p>
 
-<figure id="fig3">
+<figure id="fig4">
     <img src="/images/gaming-example-proc-time-narrow.gif" width="900" height="263" alt="Score
data for for three users." />
 </figure>
-<p>Figure 3: Score data for for three users. Each user’s scores are grouped together
in a single global window, with a trigger that generates a snapshot for output every ten minutes.</p>
+<p>Figure 4: Score data for for three users. Each user’s scores are grouped together
in a single global window, with a trigger that generates a snapshot for output every ten minutes.</p>
 
 <p>As processing time advances and more scores are processed, the trigger outputs the
updated sum for each user.</p>
 
@@ -547,12 +554,12 @@
 
 <p>The following diagram shows the relationship between ongoing processing time and
each score’s event time for two teams:</p>
 
-<figure id="fig4">
+<figure id="fig5">
     <img src="/images/gaming-example-event-time-narrow.gif" width="900" height="390" alt="Score
data by team, windowed by event time." />
 </figure>
-<p>Figure 4: Score data by team, windowed by event time. A trigger based on processing
time causes the window to emit speculative early results and include late results.</p>
+<p>Figure 5: Score data by team, windowed by event time. A trigger based on processing
time causes the window to emit speculative early results and include late results.</p>
 
-<p>The dotted line in the diagram is the “ideal” <strong>watermark</strong>:
Beam’s notion of when all data in a given window can reasonably considered to have arrived.
The irregular solid line represents the actual watermark, as determined by the data source.</p>
+<p>The dotted line in the diagram is the “ideal” <strong>watermark</strong>:
Beam’s notion of when all data in a given window can reasonably be considered to have arrived.
The irregular solid line represents the actual watermark, as determined by the data source.</p>
 
 <p>Data arriving above the solid watermark line is <em>late data</em>—this
is a score event that was delayed (perhaps generated offline) and arrived after the window
to which it belongs had closed. Our pipeline’s late-firing trigger ensures that this late
data is still included in the sum.</p>
 
@@ -698,10 +705,10 @@
 
 <p>The following diagram shows how data might look when grouped into session windows.
Unlike fixed windows, session windows are <em>different for each user</em> and
is dependent on each individual user’s play pattern:</p>
 
-<figure id="fig5">
+<figure id="fig6">
     <img src="/images/gaming-example-session-windows.png" width="662" height="521" alt="User
sessions, with a minimum gap duration." />
 </figure>
-<p>Figure 5: User sessions, with a minimum gap duration. Note how each user has different
sessions, according to how many instances they play and how long their breaks between instances
are.</p>
+<p>Figure 6: User sessions, with a minimum gap duration. Note how each user has different
sessions, according to how many instances they play and how long their breaks between instances
are.</p>
 
 <p>We can use the session-windowed data to determine the average length of uninterrupted
play time for all of our users, as well as the total score they achieve during each session.
We can do this in the code by first applying session windows, summing the score per user and
session, and then using a transform to calculate the length of each individual session:</p>
 

http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/get-started/wordcount-example/index.html
----------------------------------------------------------------------
diff --git a/content/get-started/wordcount-example/index.html b/content/get-started/wordcount-example/index.html
index ab8723f..aadd03e 100644
--- a/content/get-started/wordcount-example/index.html
+++ b/content/get-started/wordcount-example/index.html
@@ -180,10 +180,6 @@
   </li>
 </ul>
 
-<blockquote>
-  <p><strong>Note:</strong> This walkthrough is still in progress. Detailed
instructions for running the example pipelines across multiple runners are yet to be added.
There is an open issue to finish the walkthrough (<a href="https://issues.apache.org/jira/browse/BEAM-664">BEAM-664</a>).</p>
-</blockquote>
-
 <p>The WordCount examples demonstrate how to set up a processing pipeline that can
read text, tokenize the text lines into individual words, and perform a frequency count on
each of those words. The Beam SDKs contain a series of these four successively more detailed
WordCount examples that build on each other. The input text for all the examples is a set
of Shakespeare’s texts.</p>
 
 <p>Each WordCount example introduces different concepts in the Beam programming model.
Begin by understanding Minimal WordCount, the simplest of the examples. Once you feel comfortable
with the basic principles in building a pipeline, continue on to learn more concepts in the
other examples.</p>
@@ -250,7 +246,7 @@
 
 <p>Each transform takes some kind of input (data or otherwise), and produces some output
data. The input and output data is represented by the SDK class <code class="highlighter-rouge">PCollection</code>.
<code class="highlighter-rouge">PCollection</code> is a special class, provided
by the Beam SDK, that you can use to represent a data set of virtually any size, including
infinite data sets.</p>
 
-<p>&lt;img src=”/images/wordcount-pipeline.png alt=”Word Count pipeline diagram””&gt;
+<p><img src="/images/wordcount-pipeline.png" alt="Word Count pipeline diagram" />
 Figure 1: The pipeline data flow.</p>
 
 <p>The Minimal WordCount pipeline contains five transforms:</p>
@@ -305,7 +301,7 @@ Figure 1: The pipeline data flow.</p>
   <li>
     <p>A text file <code class="highlighter-rouge">Write</code>. This transform
takes the final <code class="highlighter-rouge">PCollection</code> of formatted
Strings as input and writes each element to an output text file. Each element in the input
<code class="highlighter-rouge">PCollection</code> represents one line of text
in the resulting output file.</p>
 
-    <div class="language-java highlighter-rouge"><pre class="highlight"><code>
       <span class="o">.</span><span class="na">apply</span><span
class="o">(</span><span class="n">TextIO</span><span class="o">.</span><span
class="na">Write</span><span class="o">.</span><span class="na">to</span><span
class="o">(</span><span class="s">"gs://YOUR_OUTPUT_BUCKET/AND_OUTPUT_PREFIX"</span><span
class="o">));</span>
+    <div class="language-java highlighter-rouge"><pre class="highlight"><code>
       <span class="o">.</span><span class="na">apply</span><span
class="o">(</span><span class="n">TextIO</span><span class="o">.</span><span
class="na">Write</span><span class="o">.</span><span class="na">to</span><span
class="o">(</span><span class="s">"wordcounts"</span><span class="o">));</span>
 </code></pre>
     </div>
   </li>
@@ -317,10 +313,12 @@ Figure 1: The pipeline data flow.</p>
 
 <p>Run the pipeline by calling the <code class="highlighter-rouge">run</code>
method, which sends your pipeline to be executed by the pipeline runner that you specified
when you created your pipeline.</p>
 
-<div class="language-java highlighter-rouge"><pre class="highlight"><code><span
class="n">p</span><span class="o">.</span><span class="na">run</span><span
class="o">();</span>
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span
class="n">p</span><span class="o">.</span><span class="na">run</span><span
class="o">().</span><span class="na">waitUntilFinish</span><span class="o">();</span>
 </code></pre>
 </div>
 
+<p>Note that the <code class="highlighter-rouge">run</code> method is asynchronous.
For a blocking execution instead, run your pipeline appending the <code class="highlighter-rouge">waitUntilFinish</code>
method.</p>
+
 <h2 id="wordcount-example">WordCount Example</h2>
 
 <p>This WordCount example introduces a few recommended programming practices that can
make your pipeline easier to read, write, and maintain. While not explicitly required, they
can make your pipeline’s execution more flexible, aid in testing your pipeline, and help
make your pipeline’s code reusable.</p>
@@ -465,7 +463,6 @@ Figure 1: The pipeline data flow.</p>
     <span class="o">}</span>
   <span class="o">}</span>
 <span class="o">}</span>
-
 </code></pre>
 </div>
 
@@ -527,9 +524,6 @@ Figure 1: The pipeline data flow.</p>
     <span class="n">Options</span> <span class="n">options</span>
<span class="o">=</span> <span class="o">...</span>
     <span class="n">Pipeline</span> <span class="n">pipeline</span>
<span class="o">=</span> <span class="n">Pipeline</span><span class="o">.</span><span
class="na">create</span><span class="o">(</span><span class="n">options</span><span
class="o">);</span>
 
-    <span class="cm">/**
-     * Concept #1: The Beam SDK allows running the same pipeline with a bounded or unbounded
input source.
-     */</span>
     <span class="n">PCollection</span><span class="o">&lt;</span><span
class="n">String</span><span class="o">&gt;</span> <span class="n">input</span>
<span class="o">=</span> <span class="n">pipeline</span>
       <span class="o">.</span><span class="na">apply</span><span
class="o">(</span><span class="n">TextIO</span><span class="o">.</span><span
class="na">Read</span><span class="o">.</span><span class="na">from</span><span
class="o">(</span><span class="n">options</span><span class="o">.</span><span
class="na">getInputFile</span><span class="o">()))</span>
 
@@ -540,39 +534,30 @@ Figure 1: The pipeline data flow.</p>
 
 <p>Each element in a <code class="highlighter-rouge">PCollection</code>
has an associated <strong>timestamp</strong>. The timestamp for each element is
initially assigned by the source that creates the <code class="highlighter-rouge">PCollection</code>
and can be adjusted by a <code class="highlighter-rouge">DoFn</code>. In this
example the input is bounded. For the purpose of the example, the <code class="highlighter-rouge">DoFn</code>
method named <code class="highlighter-rouge">AddTimestampsFn</code> (invoked by
<code class="highlighter-rouge">ParDo</code>) will set a timestamp for each element
in the <code class="highlighter-rouge">PCollection</code>.</p>
 
-<div class="language-java highlighter-rouge"><pre class="highlight"><code><span
class="c1">// Concept #2: Add an element timestamp, using an artificial time just to show
windowing.</span>
-<span class="c1">// See AddTimestampFn for more details on this.</span>
-<span class="o">.</span><span class="na">apply</span><span class="o">(</span><span
class="n">ParDo</span><span class="o">.</span><span class="na">of</span><span
class="o">(</span><span class="k">new</span> <span class="n">AddTimestampFn</span><span
class="o">()));</span>
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span
class="o">.</span><span class="na">apply</span><span class="o">(</span><span
class="n">ParDo</span><span class="o">.</span><span class="na">of</span><span
class="o">(</span><span class="k">new</span> <span class="n">AddTimestampFn</span><span
class="o">()));</span>
 </code></pre>
 </div>
 
 <p>Below is the code for <code class="highlighter-rouge">AddTimestampFn</code>,
a <code class="highlighter-rouge">DoFn</code> invoked by <code class="highlighter-rouge">ParDo</code>,
that sets the data element of the timestamp given the element itself. For example, if the
elements were log lines, this <code class="highlighter-rouge">ParDo</code> could
parse the time out of the log string and set it as the element’s timestamp. There are no
timestamps inherent in the works of Shakespeare, so in this case we’ve made up random timestamps
just to illustrate the concept. Each line of the input text will get a random associated timestamp
sometime in a 2-hour period.</p>
 
-<div class="language-java highlighter-rouge"><pre class="highlight"><code><span
class="cm">/**
-   * Concept #2: A DoFn that sets the data element timestamp. This is a silly method, just
for
-   * this example, for the bounded data case. Imagine that many ghosts of Shakespeare are
all 
-   * typing madly at the same time to recreate his masterworks. Each line of the corpus will

-   * get a random associated timestamp somewhere in a 2-hour period.
-   */</span>
-  <span class="kd">static</span> <span class="kd">class</span> <span
class="nc">AddTimestampFn</span> <span class="kd">extends</span> <span
class="n">DoFn</span><span class="o">&lt;</span><span class="n">String</span><span
class="o">,</span> <span class="n">String</span><span class="o">&gt;</span>
<span class="o">{</span>
-    <span class="kd">private</span> <span class="kd">static</span>
<span class="kd">final</span> <span class="n">Duration</span> <span
class="n">RAND_RANGE</span> <span class="o">=</span> <span class="n">Duration</span><span
class="o">.</span><span class="na">standardHours</span><span class="o">(</span><span
class="mi">2</span><span class="o">);</span>
-    <span class="kd">private</span> <span class="kd">final</span>
<span class="n">Instant</span> <span class="n">minTimestamp</span><span
class="o">;</span>
-
-    <span class="n">AddTimestampFn</span><span class="o">()</span>
<span class="o">{</span>
-      <span class="k">this</span><span class="o">.</span><span
class="na">minTimestamp</span> <span class="o">=</span> <span class="k">new</span>
<span class="n">Instant</span><span class="o">(</span><span class="n">System</span><span
class="o">.</span><span class="na">currentTimeMillis</span><span class="o">());</span>
-    <span class="o">}</span>
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span
class="kd">static</span> <span class="kd">class</span> <span class="nc">AddTimestampFn</span>
<span class="kd">extends</span> <span class="n">DoFn</span><span
class="o">&lt;</span><span class="n">String</span><span class="o">,</span>
<span class="n">String</span><span class="o">&gt;</span> <span
class="o">{</span>
+  <span class="kd">private</span> <span class="kd">static</span>
<span class="kd">final</span> <span class="n">Duration</span> <span
class="n">RAND_RANGE</span> <span class="o">=</span> <span class="n">Duration</span><span
class="o">.</span><span class="na">standardHours</span><span class="o">(</span><span
class="mi">2</span><span class="o">);</span>
+  <span class="kd">private</span> <span class="kd">final</span> <span
class="n">Instant</span> <span class="n">minTimestamp</span><span
class="o">;</span>
 
-    <span class="nd">@ProcessElement</span>
-    <span class="kd">public</span> <span class="kt">void</span> <span
class="nf">processElement</span><span class="o">(</span><span class="n">ProcessContext</span>
<span class="n">c</span><span class="o">)</span> <span class="o">{</span>
-      <span class="c1">// Generate a timestamp that falls somewhere in the past two
hours.</span>
-      <span class="kt">long</span> <span class="n">randMillis</span>
<span class="o">=</span> <span class="o">(</span><span class="kt">long</span><span
class="o">)</span> <span class="o">(</span><span class="n">Math</span><span
class="o">.</span><span class="na">random</span><span class="o">()</span>
<span class="o">*</span> <span class="n">RAND_RANGE</span><span
class="o">.</span><span class="na">getMillis</span><span class="o">());</span>
-      <span class="n">Instant</span> <span class="n">randomTimestamp</span>
<span class="o">=</span> <span class="n">minTimestamp</span><span
class="o">.</span><span class="na">plus</span><span class="o">(</span><span
class="n">randMillis</span><span class="o">);</span>
-      <span class="cm">/**
-       * Set the data element with that timestamp.
-       */</span>
-      <span class="n">c</span><span class="o">.</span><span class="na">outputWithTimestamp</span><span
class="o">(</span><span class="n">c</span><span class="o">.</span><span
class="na">element</span><span class="o">(),</span> <span class="k">new</span>
<span class="n">Instant</span><span class="o">(</span><span class="n">randomTimestamp</span><span
class="o">));</span>
-    <span class="o">}</span>
+  <span class="n">AddTimestampFn</span><span class="o">()</span>
<span class="o">{</span>
+    <span class="k">this</span><span class="o">.</span><span class="na">minTimestamp</span>
<span class="o">=</span> <span class="k">new</span> <span class="n">Instant</span><span
class="o">(</span><span class="n">System</span><span class="o">.</span><span
class="na">currentTimeMillis</span><span class="o">());</span>
+  <span class="o">}</span>
+
+  <span class="nd">@ProcessElement</span>
+  <span class="kd">public</span> <span class="kt">void</span> <span
class="nf">processElement</span><span class="o">(</span><span class="n">ProcessContext</span>
<span class="n">c</span><span class="o">)</span> <span class="o">{</span>
+    <span class="c1">// Generate a timestamp that falls somewhere in the past two hours.</span>
+    <span class="kt">long</span> <span class="n">randMillis</span>
<span class="o">=</span> <span class="o">(</span><span class="kt">long</span><span
class="o">)</span> <span class="o">(</span><span class="n">Math</span><span
class="o">.</span><span class="na">random</span><span class="o">()</span>
<span class="o">*</span> <span class="n">RAND_RANGE</span><span
class="o">.</span><span class="na">getMillis</span><span class="o">());</span>
+    <span class="n">Instant</span> <span class="n">randomTimestamp</span>
<span class="o">=</span> <span class="n">minTimestamp</span><span
class="o">.</span><span class="na">plus</span><span class="o">(</span><span
class="n">randMillis</span><span class="o">);</span>
+
+    <span class="c1">// Set the data element with that timestamp.</span>
+    <span class="n">c</span><span class="o">.</span><span class="na">outputWithTimestamp</span><span
class="o">(</span><span class="n">c</span><span class="o">.</span><span
class="na">element</span><span class="o">(),</span> <span class="k">new</span>
<span class="n">Instant</span><span class="o">(</span><span class="n">randomTimestamp</span><span
class="o">));</span>
   <span class="o">}</span>
+<span class="o">}</span>
 </code></pre>
 </div>
 
@@ -582,11 +567,7 @@ Figure 1: The pipeline data flow.</p>
 
 <p>The <code class="highlighter-rouge">WindowingWordCount</code> example
applies fixed-time windowing, wherein each window represents a fixed time interval. The fixed
window size for this example defaults to 1 minute (you can change this with a command-line
option).</p>
 
-<div class="language-java highlighter-rouge"><pre class="highlight"><code><span
class="cm">/**
- * Concept #3: Window into fixed windows. The fixed window size for this example defaults
to 1
- * minute (you can change this with a command-line option).
- */</span>
-<span class="n">PCollection</span><span class="o">&lt;</span><span
class="n">String</span><span class="o">&gt;</span> <span class="n">windowedWords</span>
<span class="o">=</span> <span class="n">input</span>
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span
class="n">PCollection</span><span class="o">&lt;</span><span class="n">String</span><span
class="o">&gt;</span> <span class="n">windowedWords</span> <span
class="o">=</span> <span class="n">input</span>
   <span class="o">.</span><span class="na">apply</span><span class="o">(</span><span
class="n">Window</span><span class="o">.&lt;</span><span class="n">String</span><span
class="o">&gt;</span><span class="n">into</span><span class="o">(</span>
     <span class="n">FixedWindows</span><span class="o">.</span><span
class="na">of</span><span class="o">(</span><span class="n">Duration</span><span
class="o">.</span><span class="na">standardMinutes</span><span class="o">(</span><span
class="n">options</span><span class="o">.</span><span class="na">getWindowSize</span><span
class="o">()))));</span>
 </code></pre>
@@ -596,11 +577,7 @@ Figure 1: The pipeline data flow.</p>
 
 <p>You can reuse existing <code class="highlighter-rouge">PTransform</code>s,
that were created for manipulating simple <code class="highlighter-rouge">PCollection</code>s,
over windowed <code class="highlighter-rouge">PCollection</code>s as well.</p>
 
-<div class="highlighter-rouge"><pre class="highlight"><code>/**
- * Concept #4: Re-use our existing CountWords transform that does not have knowledge of
- * windows over a PCollection containing windowed values.
- */
-PCollection&lt;KV&lt;String, Long&gt;&gt; wordCounts = windowedWords.apply(new
WordCount.CountWords());
+<div class="highlighter-rouge"><pre class="highlight"><code>PCollection&lt;KV&lt;String,
Long&gt;&gt; wordCounts = windowedWords.apply(new WordCount.CountWords());
 </code></pre>
 </div>
 
@@ -610,11 +587,7 @@ PCollection&lt;KV&lt;String, Long&gt;&gt; wordCounts
= windowedWords.apply(new W
 
 <p>In this example, we stream the results to a BigQuery table. The results are then
formatted for a BigQuery table, and then written to BigQuery using BigQueryIO.Write.</p>
 
-<div class="language-java highlighter-rouge"><pre class="highlight"><code><span
class="cm">/**
- * Concept #5: Format the results for a BigQuery table, then write to BigQuery.
- * The BigQuery output source supports both bounded and unbounded data.
- */</span>
-<span class="n">wordCounts</span><span class="o">.</span><span
class="na">apply</span><span class="o">(</span><span class="n">ParDo</span><span
class="o">.</span><span class="na">of</span><span class="o">(</span><span
class="k">new</span> <span class="n">FormatAsTableRowFn</span><span
class="o">()))</span>
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span
class="n">wordCounts</span><span class="o">.</span><span class="na">apply</span><span
class="o">(</span><span class="n">ParDo</span><span class="o">.</span><span
class="na">of</span><span class="o">(</span><span class="k">new</span>
<span class="n">FormatAsTableRowFn</span><span class="o">()))</span>
     <span class="o">.</span><span class="na">apply</span><span
class="o">(</span><span class="n">BigQueryIO</span><span class="o">.</span><span
class="na">Write</span>
       <span class="o">.</span><span class="na">to</span><span
class="o">(</span><span class="n">getTableReference</span><span class="o">(</span><span
class="n">options</span><span class="o">))</span>
       <span class="o">.</span><span class="na">withSchema</span><span
class="o">(</span><span class="n">getSchema</span><span class="o">())</span>

http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/images/gaming-example-basic.png
----------------------------------------------------------------------
diff --git a/content/images/gaming-example-basic.png b/content/images/gaming-example-basic.png
new file mode 100644
index 0000000..6f48fe5
Binary files /dev/null and b/content/images/gaming-example-basic.png differ


Mime
View raw message