kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ewe...@apache.org
Subject [2/2] kafka git commit: MINOR: add architecture section and configure / execution for streams
Date Fri, 10 Feb 2017 03:13:34 GMT
MINOR: add architecture section and configure / execution for streams

1. Added an architecture section.
2. Added a configuration / execution sub-section to developer guide.

Minor tweaks and a bunch of missing fixes from `kafka-site` repo.

Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Matthias J. Sax <matthias@confluent.io>, Derrick Or <derrickor@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #2488 from guozhangwang/KMinor-streams-docs-second-pass


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/a15fcea7
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/a15fcea7
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/a15fcea7

Branch: refs/heads/trunk
Commit: a15fcea799a843c1b4888fffbb2d382ee2e2ee36
Parents: b5dd39d
Author: Guozhang Wang <wangguoz@gmail.com>
Authored: Thu Feb 9 19:12:50 2017 -0800
Committer: Ewen Cheslack-Postava <me@ewencp.org>
Committed: Thu Feb 9 19:13:23 2017 -0800

----------------------------------------------------------------------
 docs/documentation.html                       | 144 +-----
 docs/documentation/streams.html               |  19 +
 docs/images/streams-architecture-overview.jpg | Bin 0 -> 420929 bytes
 docs/images/streams-architecture-states.jpg   | Bin 0 -> 147338 bytes
 docs/images/streams-architecture-tasks.jpg    | Bin 0 -> 130435 bytes
 docs/images/streams-architecture-threads.jpg  | Bin 0 -> 153622 bytes
 docs/images/streams-architecture-topology.jpg | Bin 0 -> 182199 bytes
 docs/introduction.html                        |  17 +-
 docs/js/templateData.js                       |   4 +-
 docs/migration.html                           |   4 +-
 docs/protocol.html                            |   8 +-
 docs/quickstart.html                          |  12 +-
 docs/streams.html                             | 544 ++++++++++++++++-----
 docs/toc.html                                 | 154 ++++++
 docs/upgrade.html                             |   2 +-
 docs/uses.html                                |   2 +-
 16 files changed, 630 insertions(+), 280 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/documentation.html
----------------------------------------------------------------------
diff --git a/docs/documentation.html b/docs/documentation.html
index 6269984..f9ab673 100644
--- a/docs/documentation.html
+++ b/docs/documentation.html
@@ -20,6 +20,7 @@
 <!--#include virtual="../includes/_header.htm" -->
 <!--#include virtual="../includes/_top.htm" -->
 
+
 <div class="content documentation documentation--current">
 	<!--#include virtual="../includes/_nav.htm" -->
 	<div class="right">
@@ -27,134 +28,8 @@
     <h1>Documentation</h1>
     <h3>Kafka 0.10.2 Documentation</h3>
     Prior releases: <a href="/07/documentation.html">0.7.x</a>, <a href="/08/documentation.html">0.8.0</a>, <a href="/081/documentation.html">0.8.1.X</a>, <a href="/082/documentation.html">0.8.2.X</a>, <a href="/090/documentation.html">0.9.0.X</a>, <a href="/0100/documentation.html">0.10.0.X</a>, <a href="/0101/documentation.html">0.10.1.X</a>.
-    </ul>
-
-    <ul class="toc">
-        <li><a href="#gettingStarted">1. Getting Started</a>
-             <ul>
-                 <li><a href="#introduction">1.1 Introduction</a>
-                 <li><a href="#uses">1.2 Use Cases</a>
-                 <li><a href="#quickstart">1.3 Quick Start</a>
-                 <li><a href="#ecosystem">1.4 Ecosystem</a>
-                 <li><a href="#upgrade">1.5 Upgrading</a>
-             </ul>
-        </li>
-        <li><a href="#api">2. APIs</a>
-              <ul>
-                  <li><a href="#producerapi">2.1 Producer API</a>
-                  <li><a href="#consumerapi">2.2 Consumer API</a>
-                  <li><a href="#streamsapi">2.3 Streams API</a>
-    			  <li><a href="#connectapi">2.4 Connect API</a>
-    			  <li><a href="#legacyapis">2.5 Legacy APIs</a>
-              </ul>
-        </li>
-        <li><a href="#configuration">3. Configuration</a>
-            <ul>
-                <li><a href="#brokerconfigs">3.1 Broker Configs</a>
-                <li><a href="#producerconfigs">3.2 Producer Configs</a>
-                <li><a href="#consumerconfigs">3.3 Consumer Configs</a>
-                    <ul>
-                        <li><a href="#newconsumerconfigs">3.3.1 New Consumer Configs</a>
-                        <li><a href="#oldconsumerconfigs">3.3.2 Old Consumer Configs</a>
-                    </ul>
-                <li><a href="#connectconfigs">3.4 Kafka Connect Configs</a>
-                <li><a href="#streamsconfigs">3.5 Kafka Streams Configs</a>
-            </ul>
-        </li>
-        <li><a href="#design">4. Design</a>
-            <ul>
-                 <li><a href="#majordesignelements">4.1 Motivation</a>
-                 <li><a href="#persistence">4.2 Persistence</a>
-                 <li><a href="#maximizingefficiency">4.3 Efficiency</a>
-                 <li><a href="#theproducer">4.4 The Producer</a>
-                 <li><a href="#theconsumer">4.5 The Consumer</a>
-                 <li><a href="#semantics">4.6 Message Delivery Semantics</a>
-                 <li><a href="#replication">4.7 Replication</a>
-                 <li><a href="#compaction">4.8 Log Compaction</a>
-                 <li><a href="#design_quotas">4.9 Quotas</a>
-            </ul>
-        </li>
-        <li><a href="#implementation">5. Implementation</a>
-            <ul>
-                  <li><a href="#apidesign">5.1 API Design</a>
-                  <li><a href="#networklayer">5.2 Network Layer</a>
-                  <li><a href="#messages">5.3 Messages</a>
-                  <li><a href="#messageformat">5.4 Message format</a>
-                  <li><a href="#log">5.5 Log</a>
-                  <li><a href="#distributionimpl">5.6 Distribution</a>
-            </ul>
-        </li>
-        <li><a href="#operations">6. Operations</a>
-            <ul>
-                 <li><a href="#basic_ops">6.1 Basic Kafka Operations</a>
-                    <ul>
-                         <li><a href="#basic_ops_add_topic">Adding and removing topics</a>
-                         <li><a href="#basic_ops_modify_topic">Modifying topics</a>
-                         <li><a href="#basic_ops_restarting">Graceful shutdown</a>
-                         <li><a href="#basic_ops_leader_balancing">Balancing leadership</a>
-                         <li><a href="#basic_ops_consumer_lag">Checking consumer position</a>
-                         <li><a href="#basic_ops_mirror_maker">Mirroring data between clusters</a>
-                         <li><a href="#basic_ops_cluster_expansion">Expanding your cluster</a>
-                         <li><a href="#basic_ops_decommissioning_brokers">Decommissioning brokers</a>
-                         <li><a href="#basic_ops_increase_replication_factor">Increasing replication factor</a>
-                    </ul>
-                 <li><a href="#datacenters">6.2 Datacenters</a>
-                 <li><a href="#config">6.3 Important Configs</a>
-                     <ul>
-                         <li><a href="#clientconfig">Important Client Configs</a>
-                         <li><a href="#prodconfig">A Production Server Configs</a>
-                     </ul>
-                   <li><a href="#java">6.4 Java Version</a>
-                   <li><a href="#hwandos">6.5 Hardware and OS</a>
-                    <ul>
-                        <li><a href="#os">OS</a>
-                        <li><a href="#diskandfs">Disks and Filesystems</a>
-                        <li><a href="#appvsosflush">Application vs OS Flush Management</a>
-                        <li><a href="#linuxflush">Linux Flush Behavior</a>
-                        <li><a href="#ext4">Ext4 Notes</a>
-                    </ul>
-                  <li><a href="#monitoring">6.6 Monitoring</a>
-                  <li><a href="#zk">6.7 ZooKeeper</a>
-                    <ul>
-                        <li><a href="#zkversion">Stable Version</a>
-                        <li><a href="#zkops">Operationalization</a>
-                    </ul>
-            </ul>
-        </li>
-        <li><a href="#security">7. Security</a>
-            <ul>
-                <li><a href="#security_overview">7.1 Security Overview</a></li>
-                <li><a href="#security_ssl">7.2 Encryption and Authentication using SSL</a></li>
-                <li><a href="#security_sasl">7.3 Authentication using SASL</a></li>
-                <li><a href="#security_authz">7.4 Authorization and ACLs</a></li>
-                <li><a href="#security_rolling_upgrade">7.5 Incorporating Security Features in a Running Cluster</a></li>
-                <li><a href="#zk_authz">7.6 ZooKeeper Authentication</a></li>
-                <ul>
-                    <li><a href="#zk_authz_new">New Clusters</a></li>
-                    <li><a href="#zk_authz_migration">Migrating Clusters</a></li>
-                    <li><a href="#zk_authz_ensemble">Migrating the ZooKeeper Ensemble</a></li>
-                </ul>
-            </ul>
-        </li>
-        <li><a href="#connect">8. Kafka Connect</a>
-            <ul>
-                <li><a href="#connect_overview">8.1 Overview</a></li>
-                <li><a href="#connect_user">8.2 User Guide</a></li>
-                <li><a href="#connect_development">8.3 Connector Development Guide</a></li>
-            </ul>
-        </li>
-        <li><a href="#streams">9. Kafka Streams</a>
-            <ul>
-                <li><a href="#streams_overview">9.1 Overview</a></li>
-                <li><a href="#streams_developer">9.2 Developer Guide</a></li>
-                <ul>
-                    <li><a href="#streams_concepts">Core Concepts</a></li>
-                    <li><a href="#streams_processor">Low-Level Processor API</a></li>
-                    <li><a href="#streams_dsl">High-Level Streams DSL</a></li>
-                </ul>
-            </ul>
-        </li>
-    </ul>
+
+    <!--#include virtual="toc.html" -->
 
     <h2><a id="gettingStarted" href="#gettingStarted">1. Getting Started</a></h2>
       <h3><a id="introduction" href="#introduction">1.1 Introduction</a></h3>
@@ -194,8 +69,15 @@
     <h2><a id="connect" href="#connect">8. Kafka Connect</a></h2>
     <!--#include virtual="connect.html" -->
 
-    <h2><a id="streams" href="#streams">9. Kafka Streams</a></h2>
-    <!--#include virtual="streams.html" -->
+    <h2><a id="streams" href="/0102/documentation/streams">9. Kafka Streams</a></h2>
+    <p>
+        Kafka Streams is a client library for processing and analyzing data stored in Kafka and either write the resulting data back to Kafka or send the final output to an external system. It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management of application state.
+    </p>
+    <p>
+        Kafka Streams has a <b>low barrier to entry</b>: You can quickly write and run a small-scale proof-of-concept on a single machine; and you only need to run additional instances of your application on multiple machines to scale up to high-volume production workloads. Kafka Streams transparently handles the load balancing of multiple instances of the same application by leveraging Kafka's parallelism model.
+    </p>
+
+    <p>Learn More about Kafka Streams read <a href="/0102/documentation/streams">this</a> Section.</p>
 
-<!--#include virtual="../includes/footer.html" -->
+<!--#include virtual="../includes/_footer.htm" -->
 <!--#include virtual="../includes/_docs_footer.htm" -->

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/documentation/streams.html
----------------------------------------------------------------------
diff --git a/docs/documentation/streams.html b/docs/documentation/streams.html
new file mode 100644
index 0000000..d8d2bb2
--- /dev/null
+++ b/docs/documentation/streams.html
@@ -0,0 +1,19 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<!-- should always link the the latest release's documentation -->
+<!--#include virtual="../streams.html" -->

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/images/streams-architecture-overview.jpg
----------------------------------------------------------------------
diff --git a/docs/images/streams-architecture-overview.jpg b/docs/images/streams-architecture-overview.jpg
new file mode 100644
index 0000000..9222079
Binary files /dev/null and b/docs/images/streams-architecture-overview.jpg differ

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/images/streams-architecture-states.jpg
----------------------------------------------------------------------
diff --git a/docs/images/streams-architecture-states.jpg b/docs/images/streams-architecture-states.jpg
new file mode 100644
index 0000000..fde12db
Binary files /dev/null and b/docs/images/streams-architecture-states.jpg differ

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/images/streams-architecture-tasks.jpg
----------------------------------------------------------------------
diff --git a/docs/images/streams-architecture-tasks.jpg b/docs/images/streams-architecture-tasks.jpg
new file mode 100644
index 0000000..2e957f9
Binary files /dev/null and b/docs/images/streams-architecture-tasks.jpg differ

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/images/streams-architecture-threads.jpg
----------------------------------------------------------------------
diff --git a/docs/images/streams-architecture-threads.jpg b/docs/images/streams-architecture-threads.jpg
new file mode 100644
index 0000000..d5f10db
Binary files /dev/null and b/docs/images/streams-architecture-threads.jpg differ

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/images/streams-architecture-topology.jpg
----------------------------------------------------------------------
diff --git a/docs/images/streams-architecture-topology.jpg b/docs/images/streams-architecture-topology.jpg
new file mode 100644
index 0000000..f42e8cd
Binary files /dev/null and b/docs/images/streams-architecture-topology.jpg differ

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/introduction.html
----------------------------------------------------------------------
diff --git a/docs/introduction.html b/docs/introduction.html
index 7ff62f9..7672a51 100644
--- a/docs/introduction.html
+++ b/docs/introduction.html
@@ -18,7 +18,7 @@
 <script><!--#include virtual="js/templateData.js" --></script>
 
 <script id="introduction-template" type="text/x-handlebars-template">
-  <h3> Kafka is <i>a distributed streaming platform</i>. What exactly does that mean?</h3>
+  <h3> Apache Kafka&trade; is <i>a distributed streaming platform</i>. What exactly does that mean?</h3>
   <p>We think of a streaming platform as having three key capabilities:</p>
   <ol>
     <li>It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging system.
@@ -203,17 +203,4 @@
   </p>
 </script>
 
-<!--#include virtual="../includes/_header.htm" -->
-<!--#include virtual="../includes/_top.htm" -->
-<div class="content documentation documentation--current">
-	<!--#include virtual="../includes/_nav.htm" -->
-	<div class="right">
-    <div class="p-introduction"></div>
-  </div>
-</div>
-<!--#include virtual="../includes/_footer.htm" -->
-
-<script>
-// Show selected style on nav item
-$(function() { $('.b-nav__intro').addClass('selected'); });
-</script>
+<div class="p-introduction"></div>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/js/templateData.js
----------------------------------------------------------------------
diff --git a/docs/js/templateData.js b/docs/js/templateData.js
index fbb9e4e..b4aedf5 100644
--- a/docs/js/templateData.js
+++ b/docs/js/templateData.js
@@ -17,6 +17,6 @@ limitations under the License.
 
 // Define variables for doc templates
 var context={
-    "version": "0101"
-    "dotVersion": "0.10.1"
+    "version": "0102",
+    "dotVersion": "0.10.2"
 };
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/migration.html
----------------------------------------------------------------------
diff --git a/docs/migration.html b/docs/migration.html
index db5fe60..08a6271 100644
--- a/docs/migration.html
+++ b/docs/migration.html
@@ -15,7 +15,7 @@
  limitations under the License.
 -->
 
-<!--#include virtual="../includes/_header.html" -->
+<!--#include virtual="../includes/_header.htm" -->
 <h2><a id="migration" href="#migration">Migrating from 0.7.x to 0.8</a></h2>
 
 0.8 is our first (and hopefully last) release with a non-backwards-compatible wire protocol, ZooKeeper     layout, and on-disk data format. This was a chance for us to clean up a lot of cruft and start fresh. This means performing a no-downtime upgrade is more painful than normal&mdash;you cannot just swap in the new code in-place.
@@ -31,4 +31,4 @@
     <li>Drink.
 </ol>
 
-<!--#include virtual="../includes/_footer.html" -->
+<!--#include virtual="../includes/_footer.htm" -->

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/protocol.html
----------------------------------------------------------------------
diff --git a/docs/protocol.html b/docs/protocol.html
index 5285f2e..4042223 100644
--- a/docs/protocol.html
+++ b/docs/protocol.html
@@ -15,10 +15,10 @@
  limitations under the License.
 -->
 
-<!--#include virtual="../includes/_header.html" -->
-<!--#include virtual="../includes/_top.html" -->
+<!--#include virtual="../includes/_header.htm" -->
+<!--#include virtual="../includes/_top.htm" -->
 <div class="content">
-    <!--#include virtual="../includes/_nav.html" -->
+    <!--#include virtual="../includes/_nav.htm" -->
     <div class="right">
         <h1>Kafka protocol guide</h1>
 
@@ -227,4 +227,4 @@ Size => int32
         $(function() { $('.b-nav__project').addClass('selected'); });
     </script>
 
-<!--#include virtual="../includes/_footer.html" -->
+<!--#include virtual="../includes/_footer.htm" -->

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/quickstart.html
----------------------------------------------------------------------
diff --git a/docs/quickstart.html b/docs/quickstart.html
index 2080cc4..bfc9af3 100644
--- a/docs/quickstart.html
+++ b/docs/quickstart.html
@@ -15,6 +15,9 @@
  limitations under the License.
 -->
 
+<script><!--#include virtual="js/templateData.js" --></script>
+
+<script id="quickstart-template" type="text/x-handlebars-template">
 <p>
 This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data.
 Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows platforms use <code>bin\windows\</code> instead of <code>bin/</code>, and change the script extension to <code>.bat</code>.
@@ -418,15 +421,16 @@ The first column shows the evolution of the current state of the <code>KTable&lt
 The second column shows the change records that result from state updates to the KTable and that are being sent to the output Kafka topic <b>streams-wordcount-output</b>.
 </p>
 
+<img src="/{{version}}/images/streams-table-updates-02.png" style="float: right; width: 25%;">
+<img src="/{{version}}/images/streams-table-updates-01.png" style="float: right; width: 25%;">
+
 <p>
 First the text line “all streams lead to kafka” is being processed.
 The <code>KTable</code> is being built up as each new word results in a new table entry (highlighted with a green background), and a corresponding change record is sent to the downstream <code>KStream</code>.
 </p>
-<img class="centered" src="/{{version}}/images/streams-table-updates-01.png">
 <p>
 When the second text line “hello kafka streams” is processed, we observe, for the first time, that existing entries in the <code>KTable</code> are being updated (here: for the words “kafka” and for “streams”). And again, change records are being sent to the output topic.
 </p>
-<img class="centered" src="/{{version}}/images/streams-table-updates-01.png">
 <p>
 And so on (we skip the illustration of how the third line is being processed). This explains why the output topic has the contents we showed above, because it contains the full record of changes.
 </p>
@@ -442,3 +446,7 @@ console consumer, as described above).
 </p>
 
 <p>You can stop the console consumer via <b>Ctrl-C</b>.</p>
+
+</script>
+
+<div class="p-quickstart"></div>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kafka/blob/a15fcea7/docs/streams.html
----------------------------------------------------------------------
diff --git a/docs/streams.html b/docs/streams.html
index 3e9334b..19af2b3 100644
--- a/docs/streams.html
+++ b/docs/streams.html
@@ -1,23 +1,23 @@
-<!--~
-  ~ Licensed to the Apache Software Foundation (ASF) under one or more
-  ~ contributor license agreements.  See the NOTICE file distributed with
-  ~ this work for additional information regarding copyright ownership.
-  ~ The ASF licenses this file to You under the Apache License, Version 2.0
-  ~ (the "License"); you may not use this file except in compliance with
-  ~ the License.  You may obtain a copy of the License at
-  ~
-  ~    http://www.apache.org/licenses/LICENSE-2.0
-  ~
-  ~ Unless required by applicable law or agreed to in writing, software
-  ~ distributed under the License is distributed on an "AS IS" BASIS,
-  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  ~ See the License for the specific language governing permissions and
-  ~ limitations under the License.
-  ~-->
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
 
 <script><!--#include virtual="js/templateData.js" --></script>
 
-<pre id="streams-template" type="text/x-handlebars-template">
+<script id="streams-template" type="text/x-handlebars-template">
     <h1>Streams</h1>
 
         <ol class="toc">
@@ -25,18 +25,21 @@
                 <a href="#streams_overview">Overview</a>
             </li>
             <li>
-                <a href="#streams_concepts">Overview</a>
+                <a href="#streams_concepts">Core Concepts</a>
             </li>
             <li>
-                <a href="#streams_developer">Developer guide</a>
+                <a href="#streams_architecture">Architecture</a>
+            </li>
+            <li>
+                <a href="#streams_developer">Developer Guide</a>
                 <ul>
-                    <li><a href="#streams_concepts">Core concepts</a>
-                    <li><a href="#streams_processor">Low-level processor API</a>
-                    <li><a href="#streams_dsl">High-level streams DSL</a>
+                    <li><a href="#streams_processor">Low-level Processor API</a></li>
+                    <li><a href="#streams_dsl">High-level Streams DSL</a></li>
+                    <li><a href="#streams_execute">Application Configuration and Execution</a></li>
                 </ul>
             </li>
             <li>
-                <a href="#streams_upgrade">Upgrade guide and API changes</a>
+                <a href="#streams_upgrade">Upgrade Guide and API Changes</a>
             </li>
         </ol>
 
@@ -44,6 +47,8 @@
 
         <p>
         Kafka Streams is a client library for processing and analyzing data stored in Kafka and either write the resulting data back to Kafka or send the final output to an external system. It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management of application state.
+        </p>
+        <p>
         Kafka Streams has a <b>low barrier to entry</b>: You can quickly write and run a small-scale proof-of-concept on a single machine; and you only need to run additional instances of your application on multiple machines to scale up to high-volume production workloads. Kafka Streams transparently handles the load balancing of multiple instances of the same application by leveraging Kafka's parallelism model.
         </p>
         <p>
@@ -53,28 +58,35 @@
         <ul>
             <li>Designed as a <b>simple and lightweight client library</b>, which can be easily embedded in any Java application and integrated with any existing packaging, deployment and operational tools that users have for their streaming applications.</li>
             <li>Has <b>no external dependencies on systems other than Apache Kafka itself</b> as the internal messaging layer; notably, it uses Kafka's partitioning model to horizontally scale processing while maintaining strong ordering guarantees.</li>
-            <li>Supports <b>fault-tolerant local state</b>, which enables very fast and efficient stateful operations like joins and aggregations.</li>
+            <li>Supports <b>fault-tolerant local state</b>, which enables very fast and efficient stateful operations like windowed joins and aggregations.</li>
             <li>Employs <b>one-record-at-a-time processing</b> to achieve millisecond processing latency, and supports <b>event-time based windowing operations</b> with late arrival of records.</li>
             <li>Offers necessary stream processing primitives, along with a <b>high-level Streams DSL</b> and a <b>low-level Processor API</b>.</li>
 
         </ul>
         <br>
 
-        <h2><a id="streams_concepts" href="#streams_concepts">Overview</a></h2>
+        <h2><a id="streams_concepts" href="#streams_concepts">Core Concepts</a></h2>
 
         <p>
             We first summarize the key concepts of Kafka Streams.
         </p>
 
-        <h5><a id="streams_topology" href="#streams_topology">Stream Processing Topology</a></h5>
+        <h3><a id="streams_topology" href="#streams_topology">Stream Processing Topology</a></h3>
 
         <ul>
             <li>A <b>stream</b> is the most important abstraction provided by Kafka Streams: it represents an unbounded, continuously updating data set. A stream is an ordered, replayable, and fault-tolerant sequence of immutable data records, where a <b>data record</b> is defined as a key-value pair.</li>
             <li>A <b>stream processing application</b> is any program that makes use of the Kafka Streams library. It defines its computational logic through one or more <b>processor topologies</b>, where a processor topology is a graph of stream processors (nodes) that are connected by streams (edges).</li>
             <li>A <b>stream processor</b> is a node in the processor topology; it represents a processing step to transform data in streams by receiving one input record at a time from its upstream processors in the topology, applying its operation to it, and may subsequently produce one or more output records to its downstream processors. </li>
         </ul>
-        <img class="centered" src="/{{version}}/images/streams-concepts-topology.jpg">
 
+        There are two special processors in the topology:
+
+        <ul>
+            <li><b>Source Processor</b>: A source processor is a special type of stream processor that does not have any upstream processors. It produces an input stream to its topology from one or multiple Kafka topics by consuming records from these topics and forward them to its down-stream processors.</li>
+            <li><b>Sink Processor</b>: A sink processor is a special type of stream processor that does not have down-stream processors. It sends any received records from its up-stream processors to a specified Kafka topic.</li>
+        </ul>
+
+        <img class="centered" src="/{{version}}/images/streams-architecture-topology.jpg" style="width:400px">
 
         <p>
             Kafka Streams offers two ways to define the stream processing topology: the <a href="#streams_dsl"><b>Kafka Streams DSL</b></a> provides
@@ -82,7 +94,12 @@
             developers define and connect custom processors as well as to interact with <a href="#streams_state">state stores</a>.
         </p>
 
-        <h5><a id="streams_time" href="#streams_time">Time</a></h5>
+        <p>
+            A processor topology is merely a logical abstraction for your stream processing code.
+            At runtime, the logical topology is instantiated and replicated inside the application for parallel processing (see <a href="#streams_architecture_tasks">Stream Partitions and Tasks</a> for details).
+        </p>
+
+        <h3><a id="streams_time" href="#streams_time">Time</a></h3>
 
         <p>
             A critical aspect in stream processing is the notion of <b>time</b>, and how it is modeled and integrated.
@@ -95,7 +112,7 @@
         <ul>
             <li><b>Event time</b> - The point in time when an event or data record occurred, i.e. was originally created "at the source". <b>Example:</b> If the event is a geo-location change reported by a GPS sensor in a car, then the associated event-time would be the time when the GPS sensor captured the location change.</li>
             <li><b>Processing time</b> - The point in time when the event or data record happens to be processed by the stream processing application, i.e. when the record is being consumed. The processing time may be milliseconds, hours, or days etc. later than the original event time. <b>Example:</b> Imagine an analytics application that reads and processes the geo-location data reported from car sensors to present it to a fleet management dashboard. Here, processing-time in the analytics application might be milliseconds or seconds (e.g. for real-time pipelines based on Apache Kafka and Kafka Streams) or hours (e.g. for batch pipelines based on Apache Hadoop or Apache Spark) after event-time.</li>
-            <li><b>Ingestion time</b> - The point in time when an event or data record is stored in a topic partition by a Kafka broker. The difference to event time is that this ingestion timestamp is generated when the record is appended to the target topic by the Kafka broker, not when the record is created "at the source". The difference to processing time is that processing time is when the stream processing application processes the record. <b>For example,</b> if a record is never processed, there is no notion of processing time for it, but it still has an ingestion time.
+            <li><b>Ingestion time</b> - The point in time when an event or data record is stored in a topic partition by a Kafka broker. The difference to event time is that this ingestion timestamp is generated when the record is appended to the target topic by the Kafka broker, not when the record is created "at the source". The difference to processing time is that processing time is when the stream processing application processes the record. <b>For example,</b> if a record is never processed, there is no notion of processing time for it, but it still has an ingestion time.</li>
         </ul>
         <p>
             The choice between event-time and ingestion-time is actually done through the configuration of Kafka (not Kafka Streams): From Kafka 0.10.x onwards, timestamps are automatically embedded into Kafka messages. Depending on Kafka's configuration these timestamps represent event-time or ingestion-time. The respective Kafka configuration setting can be specified on the broker level or per topic. The default timestamp extractor in Kafka Streams will retrieve these embedded timestamps as-is. Hence, the effective time semantics of your application depend on the effective Kafka configuration for these embedded timestamps.
@@ -113,14 +130,15 @@
 
         <p>
             Finally, whenever a Kafka Streams application writes records to Kafka, then it will also assign timestamps to these new records. The way the timestamps are assigned depends on the context:
+        </p>
+
         <ul>
             <li> When new output records are generated via processing some input record, for example, <code>context.forward()</code> triggered in the <code>process()</code> function call, output record timestamps are inherited from input record timestamps directly.</li>
             <li> When new output records are generated via periodic functions such as <code>punctuate()</code>, the output record timestamp is defined as the current internal time (obtained through <code>context.timestamp()</code>) of the stream task.</li>
             <li> For aggregations, the timestamp of a resulting aggregate update record will be that of the latest arrived input record that triggered the update.</li>
         </ul>
-        </p>
 
-        <h5><a id="streams_state" href="#streams_state">States</a></h5>
+        <h3><a id="streams_state" href="#streams_state">States</a></h3>
 
         <p>
             Some stream processing applications don't require state, which means the processing of a message is independent from
@@ -140,26 +158,143 @@
         </p>
         <br>
 
+        <h2><a id="streams_architecture" href="#streams_architecture">Architecture</a></h2>
+
+        Kafka Streams simplifies application development by building on the Kafka producer and consumer libraries and leveraging the native capabilities of
+        Kafka to offer data parallelism, distributed coordination, fault tolerance, and operational simplicity. In this section, we describe how Kafka Streams works underneath the covers.
+
+        <p>
+        The picture below shows the anatomy of an application that uses the Kafka Streams library. Let's walk through some details.
+        </p>
+        <img class="centered" src="/{{version}}/images/streams-architecture-overview.jpg" style="width:750px">
+
+        <h3><a id="streams_architecture_tasks" href="#streams_architecture_tasks">Stream Partitions and Tasks</a></h3>
+
+        <p>
+        The messaging layer of Kafka partitions data for storing and transporting it. Kafka Streams partitions data for processing it.
+        In both cases, this partitioning is what enables data locality, elasticity, scalability, high performance, and fault tolerance.
+        Kafka Streams uses the concepts of <b>partitions</b> and <b>tasks</b> as logical units of its parallelism model based on Kafka topic partitions.
+        There are close links between Kafka Streams and Kafka in the context of parallelism:
+        </p>
+
+        <ul>
+            <li>Each <b>stream partition</b> is a totally ordered sequence of data records and maps to a Kafka <b>topic partition</b>.</li>
+            <li>A <b>data record</b> in the stream maps to a Kafka <b>message</b> from that topic.</li>
+            <li>The <b>keys</b> of data records determine the partitioning of data in both Kafka and Kafka Streams, i.e., how data is routed to specific partitions within topics.</li>
+        </ul>
+
+        <p>
+        An application's processor topology is scaled by breaking it into multiple tasks.
+        More specifically, Kafka Streams creates a fixed number of tasks based on the input stream partitions for the application,
+        with each task assigned a list of partitions from the input streams (i.e., Kafka topics). The assignment of partitions to tasks
+        never changes so that each task is a fixed unit of parallelism of the application. Tasks can then instantiate their own processor topology
+        based on the assigned partitions; they also maintain a buffer for each of its assigned partitions and process messages one-at-a-time from
+        these record buffers. As a result stream tasks can be processed independently and in parallel without manual intervention.
+        </p>
+
+        <p>
+        It is important to understand that Kafka Streams is not a resource manager, but a library that “runs” anywhere its stream processing application runs.
+        Multiple instances of the application are executed either on the same machine, or spread across multiple machines and tasks can be distributed automatically
+        by the library to those running application instances. The assignment of partitions to tasks never changes; if an application instance fails, all its assigned
+        tasks will be automatically restarted on other instances and continue to consume from the same stream partitions.
+        </p>
+
+        <p>
+        The following diagram shows two tasks each assigned with one partition of the input streams.
+        </p>
+        <img class="centered" src="/{{version}}/images/streams-architecture-tasks.jpg" style="width:400px">
+        <br>
+
+        <h3><a id="streams_architecture_threads" href="#streams_architecture_threads">Threading Model</a></h3>
+
+        <p>
+        Kafka Streams allows the user to configure the number of <b>threads</b> that the library can use to parallelize processing within an application instance.
+        Each thread can execute one or more tasks with their processor topologies independently. For example, the following diagram shows one stream thread running two stream tasks.
+        </p>
+        <img class="centered" src="/{{version}}/images/streams-architecture-threads.jpg" style="width:400px">
+
+        <p>
+        Starting more stream threads or more instances of the application merely amounts to replicating the topology and having it process a different subset of Kafka partitions, effectively parallelizing processing.
+        It is worth noting that there is no shared state amongst the threads, so no inter-thread coordination is necessary. This makes it very simple to run topologies in parallel across the application instances and threads.
+        The assignment of Kafka topic partitions amongst the various stream threads is transparently handled by Kafka Streams leveraging <a href="https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Client-side+Assignment+Proposal">Kafka's coordination</a> functionality.
+        </p>
+
+        <p>
+        As we described above, scaling your stream processing application with Kafka Streams is easy: you merely need to start additional instances of your application,
+        and Kafka Streams takes care of distributing partitions amongst tasks that run in the application instances. You can start as many threads of the application
+        as there are input Kafka topic partitions so that, across all running instances of an application, every thread (or rather, the tasks it runs) has at least one input partition to process.
+        </p>
+        <br>
+
+        <h3><a id="streams_architecture_state" href="#streams_architecture_state">Local State Stores</a></h3>
+
+        <p>
+        Kafka Streams provides so-called <b>state stores</b>, which can be used by stream processing applications to store and query data,
+        which is an important capability when implementing stateful operations. The <a href="streams_dsl">Kafka Streams DSL</a>, for example, automatically creates
+        and manages such state stores when you are calling stateful operators such as <code>join()</code> or <code>aggregate()</code>, or when you are windowing a stream.
+        </p>
+
+        <p>
+        Every stream task in a Kafka Streams application may embed one or more local state stores that can be accessed via APIs to store and query data required for processing.
+        Kafka Streams offers fault-tolerance and automatic recovery for such local state stores.
+        </p>
+
+        <p>
+        The following diagram shows two stream tasks with their dedicated local state stores.
+        </p>
+        <img class="centered" src="/{{version}}/images/streams-architecture-states.jpg" style="width:400px">
+        <br>
+
+        <h3><a id="streams_architecture_recovery" href="#streams_architecture_recovery">Fault Tolerance</a></h3>
+
+        <p>
+        Kafka Streams builds on fault-tolerance capabilities integrated natively within Kafka. Kafka partitions are highly available and replicated; so when stream data is persisted to Kafka it is available
+        even if the application fails and needs to re-process it. Tasks in Kafka Streams leverage the fault-tolerance capability
+        offered by the <a href="https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client/">Kafka consumer client</a> to handle failures.
+        If a task runs on a machine that fails, Kafka Streams automatically restarts the task in one of the remaining running instances of the application.
+        </p>
+
+        <p>
+        In addition, Kafka Streams makes sure that the local state stores are robust to failures, too. For each state store, it maintains a replicated changelog Kafka topic in which it tracks any state updates.
+        These changelog topics are partitioned as well so that each local state store instance, and hence the task accessing the store, has its own dedicated changelog topic partition.
+        <a href="/documentation/#compaction">Log compaction</a> is enabled on the changelog topics so that old data can be purged safely to prevent the topics from growing indefinitely.
+        If tasks run on a machine that fails and are restarted on another machine, Kafka Streams guarantees to restore their associated state stores to the content before the failure by
+        replaying the corresponding changelog topics prior to resuming the processing on the newly started tasks. As a result, failure handling is completely transparent to the end user.
+        </p>
+
+        <p>
+        Note that the cost of task (re)initialization typically depends primarily on the time for restoring the state by replaying the state stores' associated changelog topics.
+        To minimize this restoration time, users can configure their applications to have <b>standby replicas</b> of local states (i.e. fully replicated copies of the state).
+        When a task migration happens, Kafka Streams then attempts to assign a task to an application instance where such a standby replica already exists in order to minimize
+        the task (re)initialization cost. See <code>num.standby.replicas</code> at the <a href="/documentation/#streamsconfigs">Kafka Streams Configs</a> Section.
+        </p>
+        <br>
+
         <h2><a id="streams_developer" href="#streams_developer">Developer Guide</a></h2>
 
         <p>
-        There is a <a href="#quickstart_kafkastreams">quickstart</a> example that provides how to run a stream processing program coded in the Kafka Streams library.
+        There is a <a href="/documentation/#quickstart_kafkastreams">quickstart</a> example that provides how to run a stream processing program coded in the Kafka Streams library.
         This section focuses on how to write, configure, and execute a Kafka Streams application.
         </p>
 
-
         <p>
         As we have mentioned above, the computational logic of a Kafka Streams application is defined as a <a href="#streams_topology">processor topology</a>.
         Currently Kafka Streams provides two sets of APIs to define the processor topology, which will be described in the subsequent sections.
         </p>
 
-        <h4><a id="streams_processor" href="#streams_processor">Low-Level Processor API</a></h4>
+        <h3><a id="streams_processor" href="#streams_processor">Low-Level Processor API</a></h3>
 
-        <h5><a id="streams_processor_process" href="#streams_processor_process">Processor</a></h5>
+        <h4><a id="streams_processor_process" href="#streams_processor_process">Processor</a></h4>
+
+        <p>
+        As mentioned in the <a href="#streams_concepts">Core Concepts</a> section, a stream processor is a node in the processor topology that represents a single processing step.
+        With the <code>Processor</code> API developers can define arbitrary stream processors that process one received record at a time, and connect these processors with
+        their associated state stores to compose the processor topology that represents their customized processing logic.
+        </p>
 
         <p>
-        Developers can define their customized processing logic by implementing the <code>Processor</code> interface, which
-        provides <code>process</code> and <code>punctuate</code> methods. The <code>process</code> method is performed on each
+        The <code>Processor</code> interface provides two main API methods:
+        <code>process</code> and <code>punctuate</code>. The <code>process</code> method is performed on each
         of the received record; and the <code>punctuate</code> method is performed periodically based on elapsed time.
         In addition, the processor can maintain the current <code>ProcessorContext</code> instance variable initialized in the
         <code>init</code> method, and use the context to schedule the punctuation period (<code>context().schedule</code>), to
@@ -167,17 +302,26 @@
         processing progress (<code>context().commit</code>), etc.
         </p>
 
+        <p>
+        The following example <code>Processor</code> implementation defines a simple word-count algorithm:
+        </p>
+
         <pre>
             public class MyProcessor extends Processor&lt;String, String&gt; {
                 private ProcessorContext context;
-                private KeyValueStore&lt;String, Integer&gt; kvStore;
+                private KeyValueStore&lt;String, Long&gt; kvStore;
 
                 @Override
                 @SuppressWarnings("unchecked")
                 public void init(ProcessorContext context) {
+                    // keep the processor context locally because we need it in punctuate() and commit()
                     this.context = context;
+
+                    // call this processor's punctuate() method every 1000 milliseconds.
                     this.context.schedule(1000);
-                    this.kvStore = (KeyValueStore&lt;String, Integer&gt;) context.getStateStore("Counts");
+
+                    // retrieve the key-value store named "Counts"
+                    this.kvStore = (KeyValueStore&lt;String, Long&gt;) context.getStateStore("Counts");
                 }
 
                 @Override
@@ -185,31 +329,33 @@
                     String[] words = line.toLowerCase().split(" ");
 
                     for (String word : words) {
-                        Integer oldValue = this.kvStore.get(word);
+                        Long oldValue = this.kvStore.get(word);
 
                         if (oldValue == null) {
-                            this.kvStore.put(word, 1);
+                            this.kvStore.put(word, 1L);
                         } else {
-                            this.kvStore.put(word, oldValue + 1);
+                            this.kvStore.put(word, oldValue + 1L);
                         }
                     }
                 }
 
                 @Override
                 public void punctuate(long timestamp) {
-                    KeyValueIterator&lt;String, Integer&gt; iter = this.kvStore.all();
+                    KeyValueIterator&lt;String, Long&gt; iter = this.kvStore.all();
 
                     while (iter.hasNext()) {
-                        KeyValue&lt;String, Integer&gt; entry = iter.next();
+                        KeyValue&lt;String, Long&gt; entry = iter.next();
                         context.forward(entry.key, entry.value.toString());
                     }
 
                     iter.close();
+                    // commit the current processing progress
                     context.commit();
                 }
 
                 @Override
                 public void close() {
+                    // close the key-value store
                     this.kvStore.close();
                 }
             };
@@ -217,31 +363,45 @@
 
         <p>
         In the above implementation, the following actions are performed:
+        </p>
 
         <ul>
             <li>In the <code>init</code> method, schedule the punctuation every 1 second and retrieve the local state store by its name "Counts".</li>
             <li>In the <code>process</code> method, upon each received record, split the value string into words, and update their counts into the state store (we will talk about this feature later in the section).</li>
             <li>In the <code>punctuate</code> method, iterate the local state store and send the aggregated counts to the downstream processor, and commit the current stream state.</li>
         </ul>
-        </p>
 
-        <h5><a id="streams_processor_topology" href="#streams_processor_topology">Processor Topology</a></h5>
+
+        <h4><a id="streams_processor_topology" href="#streams_processor_topology">Processor Topology</a></h4>
 
         <p>
         With the customized processors defined in the Processor API, developers can use the <code>TopologyBuilder</code> to build a processor topology
         by connecting these processors together:
+        </p>
 
         <pre>
             TopologyBuilder builder = new TopologyBuilder();
 
             builder.addSource("SOURCE", "src-topic")
+                // add "PROCESS1" node which takes the source processor "SOURCE" as its upstream processor
+                .addProcessor("PROCESS1", () -> new MyProcessor1(), "SOURCE")
 
-                .addProcessor("PROCESS1", MyProcessor1::new /* the ProcessorSupplier that can generate MyProcessor1 */, "SOURCE")
-                .addProcessor("PROCESS2", MyProcessor2::new /* the ProcessorSupplier that can generate MyProcessor2 */, "PROCESS1")
-                .addProcessor("PROCESS3", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
+                // add "PROCESS2" node which takes "PROCESS1" as its upstream processor
+                .addProcessor("PROCESS2", () -> new MyProcessor2(), "PROCESS1")
+
+                // add "PROCESS3" node which takes "PROCESS1" as its upstream processor
+                .addProcessor("PROCESS3", () -> new MyProcessor3(), "PROCESS1")
 
+                // add the sink processor node "SINK1" that takes Kafka topic "sink-topic1"
+                // as output and the "PROCESS1" node as its upstream processor
                 .addSink("SINK1", "sink-topic1", "PROCESS1")
+
+                // add the sink processor node "SINK2" that takes Kafka topic "sink-topic2"
+                // as output and the "PROCESS2" node as its upstream processor
                 .addSink("SINK2", "sink-topic2", "PROCESS2")
+
+                // add the sink processor node "SINK3" that takes Kafka topic "sink-topic3"
+                // as output and the "PROCESS3" node as its upstream processor
                 .addSink("SINK3", "sink-topic3", "PROCESS3");
         </pre>
 
@@ -252,16 +412,30 @@
             <li>Three processor nodes are then added using the <code>addProcessor</code> method; here the first processor is a child of the "SOURCE" node, but is the parent of the other two processors.</li>
             <li>Finally three sink nodes are added to complete the topology using the <code>addSink</code> method, each piping from a different parent processor node and writing to a separate topic.</li>
         </ul>
+
+        <h4><a id="streams_processor_statestore" href="#streams_processor_statestore">State Stores</a></h4>
+
+        <p>
+        Note that the <code>Processor</code> API is not limited to only accessing the current records as they arrive in the <code>process()</code> method, but can also maintain processing states
+        that keep recently arrived records to use in stateful processing operations such as windowed joins or aggregation.
+        To take advantage of these states, users can define a state store by implementing the <code>StateStore</code> interface (the Kafka Streams library also has a few extended interfaces such as <code>KeyValueStore</code>);
+        in practice, though, users usually do not need to customize such a state store from scratch but can simply use the <code>Stores</code> factory to define a state store by specifying whether it should be persistent, log-backed, etc.
+        In the following example, a persistent key-value store named “Counts” with key type <code>String</code> and value type <code>Long</code> is created.
         </p>
 
-        <h5><a id="streams_processor_statestore" href="#streams_processor_statestore">Local State Store</a></h5>
+        <pre>
+            StateStoreSupplier countStore = Stores.create("Counts")
+              .withKeys(Serdes.String())
+              .withValues(Serdes.Long())
+              .persistent()
+              .build();
+        </pre>
 
         <p>
-        Note that the Processor API is not limited to only accessing the current records as they arrive, but can also maintain local state stores
-        that keep recently arrived records to use in stateful processing operations such as aggregation or windowed joins.
-        To take advantage of this local states, developers can use the <code>TopologyBuilder.addStateStore</code> method when building the
+        To take advantage of these state stores, developers can use the <code>TopologyBuilder.addStateStore</code> method when building the
         processor topology to create the local state and associate it with the processor nodes that needs to access it; or they can connect a created
-        local state store with the existing processor nodes through <code>TopologyBuilder.connectProcessorAndStateStores</code>.
+        state store with the existing processor nodes through <code>TopologyBuilder.connectProcessorAndStateStores</code>.
+        </p>
 
         <pre>
             TopologyBuilder builder = new TopologyBuilder();
@@ -269,8 +443,8 @@
             builder.addSource("SOURCE", "src-topic")
 
                 .addProcessor("PROCESS1", MyProcessor1::new, "SOURCE")
-                // create the in-memory state store "COUNTS" associated with processor "PROCESS1"
-                .addStateStore(Stores.create("COUNTS").withStringKeys().withStringValues().inMemory().build(), "PROCESS1")
+                // add the created state store "COUNTS" associated with processor "PROCESS1"
+                .addStateStore(countStore, "PROCESS1")
                 .addProcessor("PROCESS2", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
                 .addProcessor("PROCESS3", MyProcessor3::new /* the ProcessorSupplier that can generate MyProcessor3 */, "PROCESS1")
 
@@ -282,23 +456,22 @@
                 .addSink("SINK3", "sink-topic3", "PROCESS3");
         </pre>
 
-        </p>
-
         In the next section we present another way to build the processor topology: the Kafka Streams DSL.
+        <br>
 
-        <h4><a id="streams_dsl" href="#streams_dsl">High-Level Streams DSL</a></h4>
+        <h3><a id="streams_dsl" href="#streams_dsl">High-Level Streams DSL</a></h3>
 
         To build a processor topology using the Streams DSL, developers can apply the <code>KStreamBuilder</code> class, which is extended from the <code>TopologyBuilder</code>.
         A simple example is included with the source code for Kafka in the <code>streams/examples</code> package. The rest of this section will walk
         through some code to demonstrate the key steps in creating a topology using the Streams DSL, but we recommend developers to read the full example source
         codes for details.
 
-        <h5><a id="streams_duality" href="#streams_duality">Duality of Streams and Tables</a></h5>
+        <h4><a id="streams_duality" href="#streams_duality">Duality of Streams and Tables</a></h4>
 
         <p>
         Before we discuss concepts such as aggregations in Kafka Streams we must first introduce tables, and most importantly the relationship between tables and streams:
         the so-called <a href="https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying">stream-table duality</a>.
-        Essentially, this duality means that a stream can be viewed as a table, and vice versa. Kafka’s log compaction feature, for example, exploits this duality.
+        Essentially, this duality means that a stream can be viewed as a table, and vice versa. Kafka's log compaction feature, for example, exploits this duality.
         </p>
 
         <p>
@@ -309,18 +482,18 @@
         The <b>stream-table duality</b> describes the close relationship between streams and tables.
         <ul>
         <li><b>Stream as Table</b>: A stream can be considered a changelog of a table, where each data record in the stream captures a state change of the table. A stream is thus a table in disguise, and it can be easily turned into a “real” table by replaying the changelog from beginning to end to reconstruct the table. Similarly, in a more general analogy, aggregating data records in a stream – such as computing the total number of pageviews by user from a stream of pageview events – will return a table (here with the key and the value being the user and its corresponding pageview count, respectively).</li>
-        <li><b>Table as Stream</b>: A table can be considered a snapshot, at a point in time, of the latest value for each key in a stream (a stream’s data records are key-value pairs). A table is thus a stream in disguise, and it can be easily turned into a “real” stream by iterating over each key-value entry in the table.</li>
+        <li><b>Table as Stream</b>: A table can be considered a snapshot, at a point in time, of the latest value for each key in a stream (a stream's data records are key-value pairs). A table is thus a stream in disguise, and it can be easily turned into a “real” stream by iterating over each key-value entry in the table.</li>
         </ul>
 
         <p>
-        Let’s illustrate this with an example. Imagine a table that tracks the total number of pageviews by user (first column of diagram below). Over time, whenever a new pageview event is processed, the state of the table is updated accordingly. Here, the state changes between different points in time – and different revisions of the table – can be represented as a changelog stream (second column).
+        Let's illustrate this with an example. Imagine a table that tracks the total number of pageviews by user (first column of diagram below). Over time, whenever a new pageview event is processed, the state of the table is updated accordingly. Here, the state changes between different points in time – and different revisions of the table – can be represented as a changelog stream (second column).
         </p>
-        <img class="centered" src="/{{version}}/images/streams-table-duality-02.png">
+        <img class="centered" src="/{{version}}/images/streams-table-duality-02.png" style="width:300px">
 
         <p>
         Interestingly, because of the stream-table duality, the same stream can be used to reconstruct the original table (third column):
         </p>
-        <img class="centered" src="/{{version}}/images/streams-table-duality-03.png">
+        <img class="centered" src="/{{version}}/images/streams-table-duality-03.png" style="width:600px">
 
         <p>
         The same mechanism is used, for example, to replicate databases via change data capture (CDC) and, within Kafka Streams, to replicate its so-called state stores across machines for fault-tolerance.
@@ -330,7 +503,7 @@
         <h5><a id="streams_kstream_ktable" href="#streams_kstream_ktable">KStream and KTable</a></h5>
         The DSL uses two main abstractions. A <b>KStream</b> is an abstraction of a record stream, where each data record represents a self-contained datum in the unbounded data set.
         A <b>KTable</b> is an abstraction of a changelog stream, where each data record represents an update. More precisely, the value in a data record is considered to be an update of the last value for the same record key,
-        if any (if a corresponding key doesn't exist yet, the update will be considered a create). To illustrate the difference between KStreams and KTables, let’s imagine the following two data records are being sent to the stream:
+        if any (if a corresponding key doesn't exist yet, the update will be considered a create). To illustrate the difference between KStreams and KTables, let's imagine the following two data records are being sent to the stream:
 
         <pre>
             ("alice", 1) --> ("alice", 3)
@@ -338,7 +511,7 @@
 
         If these records a KStream and the stream processing application were to sum the values it would return <code>4</code>. If these records were a KTable, the return would be <code>3</code>, since the last record would be considered as an update.
 
-        <h5><a id="streams_dsl_source" href="#streams_dsl_source">Create Source Streams from Kafka</a></h5>
+        <h4><a id="streams_dsl_source" href="#streams_dsl_source">Create Source Streams from Kafka</a></h4>
 
         <p>
         Either a <b>record stream</b> (defined as <code>KStream</code>) or a <b>changelog stream</b> (defined as <code>KTable</code>)
@@ -353,7 +526,7 @@
             KTable&lt;String, GenericRecord&gt; source2 = builder.table("topic3", "stateStoreName");
         </pre>
 
-        <h5><a id="streams_dsl_windowing" href="#streams_dsl_windowing">Windowing a stream</a></h5>
+        <h4><a id="streams_dsl_windowing" href="#streams_dsl_windowing">Windowing a stream</a></h4>
         A stream processor may need to divide data records into time buckets, i.e. to <b>window</b> the stream by time. This is usually needed for join and aggregation operations, etc. Kafka Streams currently defines the following types of windows:
         <ul>
         <li><b>Hopping time windows</b> are windows based on time intervals. They model fixed-sized, (possibly) overlapping windows. A hopping window is defined by two properties: the window's size and its advance interval (aka "hop"). The advance interval specifies by how much a window moves forward relative to the previous one. For example, you can configure a hopping window with a size 5 minutes and an advance interval of 1 minute. Since hopping windows can overlap a data record may belong to more than one such windows.</li>
@@ -367,12 +540,12 @@
         </p>
 
         <p>
-        Late-arriving records are always possible in real-time data streams. However, it depends on the effective <a href="#streams_team">time semantics</a> how late records are handled. Using processing-time, the semantics are “when the data is being processed”,
+        Late-arriving records are always possible in real-time data streams. However, it depends on the effective <a href="#streams_time">time semantics</a> how late records are handled. Using processing-time, the semantics are “when the data is being processed”,
         which means that the notion of late records is not applicable as, by definition, no record can be late. Hence, late-arriving records only really can be considered as such (i.e. as arriving “late”) for event-time or ingestion-time semantics. In both cases,
         Kafka Streams is able to properly handle late-arriving records.
         </p>
 
-        <h5><a id="streams_dsl_joins" href="#streams_dsl_joins">Join multiple streams</a></h5>
+        <h4><a id="streams_dsl_joins" href="#streams_dsl_joins">Join multiple streams</a></h4>
         A <b>join</b> operation merges two streams based on the keys of their data records, and yields a new stream. A join over record streams usually needs to be performed on a windowing basis because otherwise the number of records that must be maintained for performing the join may grow indefinitely. In Kafka Streams, you may perform the following join operations:
         <ul>
         <li><b>KStream-to-KStream Joins</b> are always windowed joins, since otherwise the memory and state required to compute the join would grow infinitely in size. Here, a newly received record from one of the streams is joined with the other stream's records within the specified window interval to produce one result for each matching pair based on user-provided <code>ValueJoiner</code>. A new <code>KStream</code> instance representing the result stream of the join is returned from this operator.</li>
@@ -392,7 +565,7 @@
         When such late arrival happens, the aggregating <code>KStream</code> or <code>KTable</code> simply emits a new aggregate value. Because the output is a <code>KTable</code>, the new value is considered to overwrite the old value with the same key in subsequent processing steps.
         </p>
 
-        <h5><a id="streams_dsl_transform" href="#streams_dsl_transform">Transform a stream</a></h5>
+        <h4><a id="streams_dsl_transform" href="#streams_dsl_transform">Transform a stream</a></h4>
 
         <p>
         Besides join and aggregation operations, there is a list of other transformation operations provided for <code>KStream</code> and <code>KTable</code> respectively.
@@ -439,7 +612,7 @@
             );
         </pre>
 
-        <h5><a id="streams_dsl_sink" href="#streams_dsl_sink">Write streams back to Kafka</a></h5>
+        <h4><a id="streams_dsl_sink" href="#streams_dsl_sink">Write streams back to Kafka</a></h4>
 
         <p>
         At the end of the processing, users can choose to (continuously) write the final resulted streams back to a Kafka topic through
@@ -461,96 +634,223 @@
             // materialized = builder.stream("topic4");
             KStream&lt;String, String&gt; materialized = joined.through("topic4");
         </pre>
+        <br>
 
+        <h3><a id="streams_execute" href="#streams_execute">Application Configuration and Execution</a></h3>
 
-        <br>
         <p>
         Besides defining the topology, developers will also need to configure their applications
         in <code>StreamsConfig</code> before running it. A complete list of
-        Kafka Streams configs can be found <a href="#streamsconfigs"><b>here</b></a>.
+        Kafka Streams configs can be found <a href="/documentation/#streamsconfigs"><b>here</b></a>.
         </p>
 
-        <h2><a id="streams_upgrade" href="#upgrade">Upgrade guide and API changes</a></h2>
+        <p>
+        Specifying the configuration in Kafka Streams is similar to the Kafka Producer and Consumer clients. Typically, you create a <code>java.util.Properties</code> instance,
+        set the necessary parameters, and construct a <code>StreamsConfig</code> instance from the <code>Properties</code> instance.
+        </p>
 
-        <h3><a id="streams_upgrade_1020" href="#upgrade_1020">Upgrading a Kafka Streams Application</a></h3>
+        <pre>
+            import java.util.Properties;
+            import org.apache.kafka.streams.StreamsConfig;
+
+            Properties settings = new Properties();
+            // Set a few key parameters
+            settings.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-first-streams-application");
+            settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092");
+            settings.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, "zookeeper1:2181");
+            // Any further settings
+            settings.put(... , ...);
+
+            // Create an instance of StreamsConfig from the Properties instance
+            StreamsConfig config = new StreamsConfig(settings);
+        </pre>
 
-        <h4>Upgrading from 0.10.1.x to 0.10.2.0</h4>
         <p>
-        See <a href="../#upgrade_1020_streams">Upgrade Section</a> for details.
+        Apart from Kafka Streams' own configuration parameters you can also specify parameters for the Kafka consumers and producers that are used internally,
+        depending on the needs of your application. Similar to the Streams settings you define any such consumer and/or producer settings via <code>StreamsConfig</code>.
+        Note that some consumer and producer configuration parameters do use the same parameter name. For example, <code>send.buffer.bytes</code> or <code>receive.buffer.bytes</code> which
+        are used to configure TCP buffers; <code>request.timeout.ms</code> and <code>retry.backoff.ms</code> which control retries for client request (and some more).
+        If you want to set different values for consumer and producer for such a parameter, you can prefix the parameter name with <code>consumer.</code> or <code>producer.</code>:
         </p>
 
-        <h3><a id="streams_api_changes" href="#api_changes">Streams API changes in 0.10.2.0</a></h3>
-        <li> New methods in <code>KafkaStreams</code>:
-            <ul>
-                <li> set a listener to react on application state change via <code>#setStateListener(StateListener listener)</code> </li>
-                <li> retrieve the current application state via <code>#state()</code> </li>
-                <li> retrieve the global metrics registry via <code>#metrics()</code> </li>
-                <li> apply a timeout when closing an application via <code>#close(long timeout, TimeUnit timeUnit)</code> </li>
-                <li> specify a custom indent when retrieving Kafka Streams information via <code>#toString(String indent)</code> </li>
-            </ul>
-        </li>
-        <li> Parameter updates in <code>StreamsConfig</code>:
-            <ul>
-                <li> parameter <code>zookeeper.connect</code> was deprecated </li>
-                <ul>
-                    <li> a Kafka Streams application does no longer interact with Zookeeper for topic management but uses the new broker admin protocol
-                         (cf. <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-TopicAdminSchema.1">KIP-4, Section "Topic Admin Schema"</a>) </li>
-                    <li> thus, parameter "zookeeper.connect" is ignored in 0.10.2 and should be removed from <code>StreamsConfig</code> </li>
-                </ul>
-                <li> added many new parameters for metrics, security, and client configurations </li>
-            </ul>
-        </li>
-        <li> Changes in <code>StreamsMetrics</code> interface:
-            <ul>
-                <li> removed methods: <code>#addLatencySensor()</code> </li>
-                <li> added methods: <code>#addLatencyAndThroughputSensor()</code>, <code>#addThroughputSensor()</code>, <code>#recordThroughput()</code>,
-                     <code>#addSensor()</code>, <code>#removeSensor()</code> </li>
-            </ul>
-        </li>
-        <li> New methods in <code>TopologyBuilder</code>:
+        <pre>
+            Properties settings = new Properties();
+            // Example of a "normal" setting for Kafka Streams
+            settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker-01:9092");
+
+            // Customize the Kafka consumer settings
+            streamsSettings.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 60000);
+
+            // Customize a common client setting for both consumer and producer
+            settings.put(CommonClientConfigs.RETRY_BACKOFF_MS_CONFIG, 100L);
+
+            // Customize different values for consumer and producer
+            settings.put("consumer." + ConsumerConfig.RECEIVE_BUFFER_CONFIG, 1024 * 1024);
+            settings.put("producer." + ProducerConfig.RECEIVE_BUFFER_CONFIG, 64 * 1024);
+            // Alternatively, you can use
+            settings.put(StreamsConfig.consumerPrefix(ConsumerConfig.RECEIVE_BUFFER_CONFIG), 1024 * 1024);
+            settings.put(StremasConfig.producerConfig(ProducerConfig.RECEIVE_BUFFER_CONFIG), 64 * 1024);
+        </pre>
+
+        <p>
+        You can call Kafka Streams from anywhere in your application code.
+        Very commonly though you would do so within the <code>main()</code> method of your application, or some variant thereof.
+        </p>
+
+        <p>
+        First, you must create an instance of <code>KafkaStreams</code>. The first argument of the <code>KafkaStreams</code> constructor takes a topology
+        builder (either <code>KStreamBuilder</code> for the Kafka Streams DSL, or <code>TopologyBuilder</code> for the Processor API)
+        that is used to define a topology; The second argument is an instance of <code>StreamsConfig</code> mentioned above.
+        </p>
+
+        <pre>
+            import org.apache.kafka.streams.KafkaStreams;
+            import org.apache.kafka.streams.StreamsConfig;
+            import org.apache.kafka.streams.kstream.KStreamBuilder;
+            import org.apache.kafka.streams.processor.TopologyBuilder;
+
+            // Use the builders to define the actual processing topology, e.g. to specify
+            // from which input topics to read, which stream operations (filter, map, etc.)
+            // should be called, and so on.
+
+            KStreamBuilder builder = ...;  // when using the Kafka Streams DSL
+            //
+            // OR
+            //
+            TopologyBuilder builder = ...; // when using the Processor API
+
+            // Use the configuration to tell your application where the Kafka cluster is,
+            // which serializers/deserializers to use by default, to specify security settings,
+            // and so on.
+            StreamsConfig config = ...;
+
+            KafkaStreams streams = new KafkaStreams(builder, config);
+        </pre>
+
+        <p>
+        At this point, internal structures have been initialized, but the processing is not started yet. You have to explicitly start the Kafka Streams thread by calling the <code>start()</code> method:
+        </p>
+
+        <pre>
+            // Start the Kafka Streams instance
+            streams.start();
+        </pre>
+
+        <p>
+        To catch any unexpected exceptions, you may set an <code>java.lang.Thread.UncaughtExceptionHandler</code> before you start the application. This handler is called whenever a stream thread is terminated by an unexpected exception:
+        </p>
+
+        <pre>
+            streams.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
+                public uncaughtException(Thread t, throwable e) {
+                    // here you should examine the exception and perform an appropriate action!
+                }
+            );
+        </pre>
+
+        <p>
+        To stop the application instance call the <code>close()</code> method:
+        </p>
+
+        <pre>
+            // Stop the Kafka Streams instance
+            streams.close();
+        </pre>
+
+        Now it's time to execute your application that uses the Kafka Streams library, which can be run just like any other Java application – there is no special magic or requirement on the side of Kafka Streams.
+        For example, you can package your Java application as a fat jar file and then start the application via:
+
+        <pre>
+            # Start the application in class `com.example.MyStreamsApp`
+            # from the fat jar named `path-to-app-fatjar.jar`.
+            $ java -cp path-to-app-fatjar.jar com.example.MyStreamsApp
+        </pre>
+
+        <p>
+        When the application instance starts running, the defined processor topology will be initialized as one or more stream tasks that can be executed in parallel by the stream threads within the instance.
+        If the processor topology defines any state stores, these state stores will also be (re-)constructed, if possible, during the initialization
+        period of their associated stream tasks.
+        It is important to understand that, when starting your application as described above, you are actually launching what Kafka Streams considers to be one instance of your application.
+        More than one instance of your application may be running at a time, and in fact the common scenario is that there are indeed multiple instances of your application running in parallel (e.g., on another JVM or another machine).
+        In such cases, Kafka Streams transparently re-assigns tasks from the existing instances to the new instance that you just started.
+        See <a href="#streams_architecture_tasks">Stream Partitions and Tasks</a> and <a href="#streams_architecture_threads">Threading Model</a> for details.
+        </p>
+        <br>
+
+        <h2><a id="streams_upgrade" href="#streams_upgrade">Upgrade Guide and API Changes</a></h2>
+
+        <p>
+        See the <a href="/documentation/#upgrade_1020_streams">Upgrade Section</a> for upgrading a Kafka Streams Application from 0.10.1.x to 0.10.2.0.
+        </p>
+
+        <h3><a id="streams_api_changes" href="#streams_api_changes">Streams API changes in 0.10.2.0</a></h3>
+
+        <p>
+            New methods in <code>KafkaStreams</code>:
+        </p>
+        <ul>
+            <li> set a listener to react on application state change via <code>#setStateListener(StateListener listener)</code> </li>
+            <li> retrieve the current application state via <code>#state()</code> </li>
+            <li> retrieve the global metrics registry via <code>#metrics()</code> </li>
+            <li> apply a timeout when closing an application via <code>#close(long timeout, TimeUnit timeUnit)</code> </li>
+            <li> specify a custom indent when retrieving Kafka Streams information via <code>#toString(String indent)</code> </li>
+        </ul>
+
+        <p>
+            Parameter updates in <code>StreamsConfig</code>:
+        </p>
+        <ul>
+            <li> parameter <code>zookeeper.connect</code> was deprecated; a Kafka Streams application does no longer interact with Zookeeper for topic management but uses the new broker admin protocol
+                (cf. <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-TopicAdminSchema.1">KIP-4, Section "Topic Admin Schema"</a>) </li>
+            <li> added many new parameters for metrics, security, and client configurations </li>
+        </ul>
+
+        <p> Changes in <code>StreamsMetrics</code> interface: </p>
+        <ul>
+            <li> removed methods: <code>#addLatencySensor()</code> </li>
+            <li> added methods: <code>#addLatencyAndThroughputSensor()</code>, <code>#addThroughputSensor()</code>, <code>#recordThroughput()</code>,
+            <code>#addSensor()</code>, <code>#removeSensor()</code> </li>
+        </ul>
+        <p> New methods in <code>TopologyBuilder</code>: </p>
             <ul>
                 <li> added overloads for <code>#addSource()</code> that allow to define a <code>auto.offset.reset</code> policy per source node </li>
                 <li> added methods <code>#addGlobalStore()</code> to add global <code>StateStore</code>s </li>
             </ul>
-        </li>
-        <li> New methods in <code>KStreamBuilder</code>:
+
+        <p> New methods in <code>KStreamBuilder</code>: </p>
             <ul>
                 <li> added overloads for <code>#stream()</code> and <code>#table()</code> that allow to define a <code>auto.offset.reset</code> policy per input stream/table </li>
                 <li> <code>#table()</code> always requires store name </li>
                 <li> added method <code>#globalKTable()</code> to create a <code>GlobalKTable</code> </li>
             </ul>
-        </li>
-        <li> New joins for <code>KStream</code>:
+
+        <p> New joins for <code>KStream</code>: </p>
             <ul>
                 <li> added overloads for <code>#join()</code> to join with <code>KTable</code> </li>
                 <li> added overloads for <code>#join()</code> and <code>leftJoin()</code> to join with <code>GlobalKTable</code> </li>
             </ul>
-        </li>
-        <li> Aligned <code>null</code>-key handling for <code>KTable</code> joins:
+
+        <p> Aligned <code>null</code>-key handling for <code>KTable</code> joins: </p>
             <ul>
                 <li> like all other KTable operations, <code>KTable-KTable</code> joins do not throw an exception on <code>null</code> key records anymore, but drop those records silently </li>
             </ul>
-        </li>
-        <li> New window type <em>Session Windows</em>:
+
+        <p> New window type <em>Session Windows</em>: </p>
             <ul>
                 <li> added class <code>SessionWindows</code> to specify session windows </li>
                 <li> added overloads for <code>KGroupedStream</code> methods <code>#count()</code>, <code>#reduce()</code>, and <code>#aggregate()</code>
                      to allow session window aggregations </li>
             </ul>
-        </li>
-        <li> Changes to <code>TimestampExtractor</code>:
+
+        <p> Changes to <code>TimestampExtractor</code>: </p>
             <ul>
                 <li> method <code>#extract()</code> has a second parameter now </li>
                 <li> new default timestamp extractor class <code>FailOnInvalidTimestamp</code>
                      (it gives the same behavior as old (and removed) default extractor <code>ConsumerRecordTimestampExtractor</code>) </li>
                 <li> new alternative timestamp extractor classes <code>LogAndSkipOnInvalidTimestamp</code> and <code>UsePreviousTimeOnInvalidTimestamps</code> </li>
             </ul>
-        </li>
-        <li> Relaxed type constraints of many DSL interfaces, classes, and methods (cf. <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-100+-+Relax+Type+constraints+in+Kafka+Streams+API">KIP-100</a>). </li>
-    </ul>
-</pre>
-
 
+        <p> Relaxed type constraints of many DSL interfaces, classes, and methods (cf. <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-100+-+Relax+Type+constraints+in+Kafka+Streams+API">KIP-100</a>). </p>
 </script>
 
 <!--#include virtual="../includes/_header.htm" -->


Mime
View raw message