flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fhue...@apache.org
Subject [2/4] flink-web git commit: rebuild website
Date Fri, 04 Dec 2015 17:43:10 GMT
http://git-wip-us.apache.org/repos/asf/flink-web/blob/0507169a/content/news/2015/08/24/introducing-flink-gelly.html
----------------------------------------------------------------------
diff --git a/content/news/2015/08/24/introducing-flink-gelly.html b/content/news/2015/08/24/introducing-flink-gelly.html
index 2e05a24..2bc2f8a 100644
--- a/content/news/2015/08/24/introducing-flink-gelly.html
+++ b/content/news/2015/08/24/introducing-flink-gelly.html
@@ -225,21 +225,21 @@ and mutations as well as neighborhood aggregations.</p>
 
 <h4 id="common-graph-metrics">Common Graph Metrics</h4>
 <p>These methods can be used to retrieve several graph metrics and properties, such as the number
-of vertices, edges and the node degrees.</p>
+of vertices, edges and the node degrees. </p>
 
 <h4 id="transformations">Transformations</h4>
 <p>The transformation methods enable several Graph operations, using high-level functions similar to
 the ones provided by the batch processing API. These transformations can be applied one after the
-other, yielding a new Graph after each step, in a fashion similar to operators on DataSets:</p>
+other, yielding a new Graph after each step, in a fashion similar to operators on DataSets: </p>
 
 <div class="highlight"><pre><code class="language-java"><span class="n">inputGraph</span><span class="o">.</span><span class="na">getUndirected</span><span class="o">().</span><span class="na">mapEdges</span><span class="o">(</span><span class="k">new</span> <span class="nf">CustomEdgeMapper</span><span class="o">());</span></code></pre></div>
 
 <p>Transformations can be applied on:</p>
 
 <ol>
-  <li><strong>Vertices</strong>: <code>mapVertices</code>, <code>joinWithVertices</code>, <code>filterOnVertices</code>, <code>addVertex</code>, …</li>
-  <li><strong>Edges</strong>: <code>mapEdges</code>, <code>filterOnEdges</code>, <code>removeEdge</code>, …</li>
-  <li><strong>Triplets</strong> (source vertex, target vertex, edge): <code>getTriplets</code></li>
+  <li><strong>Vertices</strong>: <code>mapVertices</code>, <code>joinWithVertices</code>, <code>filterOnVertices</code>, <code>addVertex</code>, …  </li>
+  <li><strong>Edges</strong>: <code>mapEdges</code>, <code>filterOnEdges</code>, <code>removeEdge</code>, …   </li>
+  <li><strong>Triplets</strong> (source vertex, target vertex, edge): <code>getTriplets</code>  </li>
 </ol>
 
 <h4 id="neighborhood-aggregations">Neighborhood Aggregations</h4>
@@ -373,7 +373,7 @@ vertex values do not need to be recomputed during an iteration.</p>
 <p>Let us reconsider the Single Source Shortest Paths algorithm. In each iteration, a vertex:</p>
 
 <ol>
-  <li><strong>Gather</strong> retrieves distances from its neighbors summed up with the corresponding edge values;</li>
+  <li><strong>Gather</strong> retrieves distances from its neighbors summed up with the corresponding edge values; </li>
   <li><strong>Sum</strong> compares the newly obtained distances in order to extract the minimum;</li>
   <li><strong>Apply</strong> and finally adopts the minimum distance computed in the sum step,
 provided that it is lower than its current value. If a vertex’s value does not change during
@@ -432,7 +432,7 @@ plays that each song has. We then filter out the list of songs the users do not
 playlist. Then we compute the top songs per user (i.e. the songs a user listened to the most).
 Finally, as a separate use-case on the same data set, we create a user-user similarity graph based
 on the common songs and use this resulting graph to detect communities by calling Gelly’s Label Propagation
-library method.</p>
+library method. </p>
 
 <p>For running the example implementation, please use the 0.10-SNAPSHOT version of Flink as a
 dependency. The full example code base can be found <a href="https://github.com/apache/flink/blob/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/MusicProfiles.java">here</a>. The public data set used for testing
@@ -522,10 +522,10 @@ in the figure below.</p>
 
 <p>To form the user-user graph in Flink, we will simply take the edges from the user-song graph
 (left-hand side of the image), group them by song-id, and then add all the users (source vertex ids)
-to an ArrayList.</p>
+to an ArrayList. </p>
 
 <p>We then match users who listened to the same song two by two, creating a new edge to mark their
-common interest (right-hand side of the image).</p>
+common interest (right-hand side of the image). </p>
 
 <p>Afterwards, we perform a <code>distinct()</code> operation to avoid creation of duplicate data.
 Considering that we now have the DataSet of edges which present interest, creating a graph is as
@@ -564,7 +564,7 @@ formed. To do so, we first initialize each vertex with a numeric label using the
 the id of a vertex with the first element of the tuple, afterwards applying a map function.
 Finally, we call the <code>run()</code> method with the LabelPropagation library method passed
 as a parameter. In the end, the vertices will be updated to contain the most frequent label
-among their neighbors.</p>
+among their neighbors. </p>
 
 <div class="highlight"><pre><code class="language-java"><span class="c1">// detect user communities using label propagation</span>
 <span class="c1">// initialize each vertex with a unique numeric label</span>
@@ -594,10 +594,10 @@ among their neighbors.</p>
 <p>Currently, Gelly matches the basic functionalities provided by most state-of-the-art graph
 processing systems. Our vision is to turn Gelly into more than “yet another library for running
 PageRank-like algorithms” by supporting generic iterations, implementing graph partitioning,
-providing bipartite graph support and by offering numerous other features.</p>
+providing bipartite graph support and by offering numerous other features. </p>
 
 <p>We are also enriching Flink Gelly with a set of operators suitable for highly skewed graphs
-as well as a Graph API built on Flink Streaming.</p>
+as well as a Graph API built on Flink Streaming. </p>
 
 <p>In the near future, we would like to see how Gelly can be integrated with graph visualization
 tools, graph database systems and sampling techniques.</p>

http://git-wip-us.apache.org/repos/asf/flink-web/blob/0507169a/content/news/2015/09/16/off-heap-memory.html
----------------------------------------------------------------------
diff --git a/content/news/2015/09/16/off-heap-memory.html b/content/news/2015/09/16/off-heap-memory.html
index c1b1efc..50f4016 100644
--- a/content/news/2015/09/16/off-heap-memory.html
+++ b/content/news/2015/09/16/off-heap-memory.html
@@ -205,7 +205,7 @@
 
 <h2 id="the-off-heap-memory-implementation">The off-heap Memory Implementation</h2>
 
-<p>Given that all memory intensive internal algorithms are already implemented against the <code>MemorySegment</code>, our implementation to switch to off-heap memory is actually trivial. You can compare it to replacing all <code>ByteBuffer.allocate(numBytes)</code> calls with <code>ByteBuffer.allocateDirect(numBytes)</code>. In Flink’s case it meant that we made the <code>MemorySegment</code> abstract and added the <code>HeapMemorySegment</code> and <code>OffHeapMemorySegment</code> subclasses. The <code>OffHeapMemorySegment</code> takes the off-heap memory pointer from a <code>java.nio.DirectByteBuffer</code> and implements its specialized access methods using <code>sun.misc.Unsafe</code>. We also made a few adjustments to the startup scripts and the deployment code to make sure that the JVM is permitted enough off-heap memory (direct memory, <em>-XX:MaxDirectMemorySize</em>).</p>
+<p>Given that all memory intensive internal algorithms are already implemented against the <code>MemorySegment</code>, our implementation to switch to off-heap memory is actually trivial. You can compare it to replacing all <code>ByteBuffer.allocate(numBytes)</code> calls with <code>ByteBuffer.allocateDirect(numBytes)</code>. In Flink’s case it meant that we made the <code>MemorySegment</code> abstract and added the <code>HeapMemorySegment</code> and <code>OffHeapMemorySegment</code> subclasses. The <code>OffHeapMemorySegment</code> takes the off-heap memory pointer from a <code>java.nio.DirectByteBuffer</code> and implements its specialized access methods using <code>sun.misc.Unsafe</code>. We also made a few adjustments to the startup scripts and the deployment code to make sure that the JVM is permitted enough off-heap memory (direct memory, <em>-XX:MaxDirectMemorySize</em>). </p>
 
 <p>In practice we had to go one step further, to make the implementation perform well. While the <code>ByteBuffer</code> is used in I/O code paths to compose headers and move bulk memory into place, the MemorySegment is part of the innermost loops of many algorithms (sorting, hash tables, …). That means that the access methods have to be as fast as possible.</p>
 

http://git-wip-us.apache.org/repos/asf/flink-web/blob/0507169a/content/news/2015/11/16/release-0.10.0.html
----------------------------------------------------------------------
diff --git a/content/news/2015/11/16/release-0.10.0.html b/content/news/2015/11/16/release-0.10.0.html
index 1bf2ff7..7afd6e6 100644
--- a/content/news/2015/11/16/release-0.10.0.html
+++ b/content/news/2015/11/16/release-0.10.0.html
@@ -161,7 +161,7 @@
 
 <p>The Apache Flink community is pleased to announce the availability of the 0.10.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on data stream processing and operational features. About 80 contributors provided bug fixes, improvements, and new features such that in total more than 400 JIRA issues could be resolved.</p>
 
-<p>For Flink 0.10.0, the focus of the community was to graduate the DataStream API from beta and to evolve Apache Flink into a production-ready stream data processor with a competitive feature set. These efforts resulted in support for event-time and out-of-order streams, exactly-once guarantees in the case of failures, a very flexible windowing mechanism, sophisticated operator state management, and a highly-available cluster operation mode. Flink 0.10.0 also brings a new monitoring dashboard with real-time system and job monitoring capabilities. Both batch and streaming modes of Flink benefit from the new high availability and improved monitoring features. Needless to say that Flink 0.10.0 includes many more features, improvements, and bug fixes.</p>
+<p>For Flink 0.10.0, the focus of the community was to graduate the DataStream API from beta and to evolve Apache Flink into a production-ready stream data processor with a competitive feature set. These efforts resulted in support for event-time and out-of-order streams, exactly-once guarantees in the case of failures, a very flexible windowing mechanism, sophisticated operator state management, and a highly-available cluster operation mode. Flink 0.10.0 also brings a new monitoring dashboard with real-time system and job monitoring capabilities. Both batch and streaming modes of Flink benefit from the new high availability and improved monitoring features. Needless to say that Flink 0.10.0 includes many more features, improvements, and bug fixes. </p>
 
 <p>We encourage everyone to <a href="/downloads.html">download the release</a> and <a href="https://ci.apache.org/projects/flink/flink-docs-release-0.10/">check out the documentation</a>. Feedback through the Flink <a href="/community.html#mailing-lists">mailing lists</a> is, as always, very welcome!</p>
 

http://git-wip-us.apache.org/repos/asf/flink-web/blob/0507169a/content/news/2015/12/04/Introducing-windows.html
----------------------------------------------------------------------
diff --git a/content/news/2015/12/04/Introducing-windows.html b/content/news/2015/12/04/Introducing-windows.html
new file mode 100644
index 0000000..23536b4
--- /dev/null
+++ b/content/news/2015/12/04/Introducing-windows.html
@@ -0,0 +1,356 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
+    <title>Apache Flink: Introducing Stream Windows in Apache Flink</title>
+    <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+    <link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+    <!-- Bootstrap -->
+    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css">
+    <link rel="stylesheet" href="/css/flink.css">
+    <link rel="stylesheet" href="/css/syntax.css">
+
+    <!-- Blog RSS feed -->
+    <link href="/blog/feed.xml" rel="alternate" type="application/rss+xml" title="Apache Flink Blog: RSS feed" />
+
+    <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
+    <!-- We need to load Jquery in the header for custom google analytics event tracking-->
+    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.2/jquery.min.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
+    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
+    <!--[if lt IE 9]>
+      <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
+      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
+    <![endif]-->
+  </head>
+  <body>  
+    
+
+  <!-- Top navbar. -->
+    <nav class="navbar navbar-default navbar-fixed-top">
+      <div class="container">
+        <!-- The logo. -->
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <div class="navbar-logo">
+            <a href="/">
+              <img alt="Apache Flink" src="/img/navbar-brand-logo.jpg" width="78px" height="40px">
+            </a>
+          </div>
+        </div><!-- /.navbar-header -->
+
+        <!-- The navigation links. -->
+        <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
+          <ul class="nav navbar-nav">
+
+            <!-- Overview -->
+            <li><a href="/index.html">Overview</a></li>
+
+            <!-- Features -->
+            <li><a href="/features.html">Features</a></li>
+
+            <!-- Downloads -->
+            <li><a href="/downloads.html">Downloads</a></li>
+
+            <!-- FAQ -->
+            <li><a href="/faq.html">FAQ</a></li>
+
+
+            <!-- Quickstart -->
+            <li class="dropdown">
+              <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false"><small><span class="glyphicon glyphicon-new-window"></span></small> Quickstart <span class="caret"></span></a>
+              <ul class="dropdown-menu" role="menu">
+                <li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.10/quickstart/setup_quickstart.html">Setup</a></li>
+                <li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.10/quickstart/java_api_quickstart.html">Java API</a></li>
+                <li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.10/quickstart/scala_api_quickstart.html">Scala API</a></li>
+                <li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.10/quickstart/run_example_quickstart.html">Run Step-by-Step Example</a></li>
+              </ul>
+            </li>
+
+
+            <!-- Documentation -->
+            <li class="dropdown">
+              <a href="" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false"><small><span class="glyphicon glyphicon-new-window"></span></small> Documentation <span class="caret"></span></a>
+              <ul class="dropdown-menu" role="menu">
+                <!-- Latest stable release -->
+                <li role="presentation" class="dropdown-header"><strong>Latest Release</strong> (Stable)</li>
+                <li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.10">0.10.1 Documentation</a></li>
+                <li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java" class="active">0.10.1 Javadocs</a></li>
+                <li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.10/api/scala/index.html" class="active">0.10.1 ScalaDocs</a></li>
+
+                <!-- Snapshot docs -->
+                <li class="divider"></li>
+                <li role="presentation" class="dropdown-header"><strong>Snapshot</strong> (Development)</li>
+                <li><a href="http://ci.apache.org/projects/flink/flink-docs-master">1.0 Documentation</a></li>
+                <li><a href="http://ci.apache.org/projects/flink/flink-docs-master/api/java" class="active">1.0 Javadocs</a></li>
+                <li><a href="http://ci.apache.org/projects/flink/flink-docs-master/api/scala/index.html" class="active">1.0 ScalaDocs</a></li>
+
+                <!-- Wiki -->
+                <li class="divider"></li>
+                <li><a href="https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home">Wiki</a></li>
+              </ul>
+            </li>
+
+          </ul>
+
+          <ul class="nav navbar-nav navbar-right">
+            <!-- Blog -->
+            <li class=" active hidden-md hidden-sm"><a href="/blog/">Blog</a></li>
+
+            <li class="dropdown hidden-md hidden-sm">
+              <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Community <span class="caret"></span></a>
+              <ul class="dropdown-menu" role="menu">
+                <!-- Community -->
+                <li role="presentation" class="dropdown-header"><strong>Community</strong></li>
+                <li><a href="/community.html#mailing-lists">Mailing Lists</a></li>
+                <li><a href="/community.html#irc">IRC</a></li>
+                <li><a href="/community.html#stack-overflow">Stack Overflow</a></li>
+                <li><a href="/community.html#issue-tracker">Issue Tracker</a></li>
+                <li><a href="/community.html#third-party-packages">Third Party Packages</a></li>
+                <li><a href="/community.html#source-code">Source Code</a></li>
+                <li><a href="/community.html#people">People</a></li>
+                <li><a href="https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink"><small><span class="glyphicon glyphicon-new-window"></span></small> Powered by Flink</a></li>
+
+                <!-- Contribute -->
+                <li class="divider"></li>
+                <li role="presentation" class="dropdown-header"><strong>Contribute</strong></li>
+                <li><a href="/how-to-contribute.html">How to Contribute</a></li>
+                <li><a href="/contribute-code.html">Contribute Code</a></li>
+                <li><a href="/contribute-documentation.html">Contribute Documentation</a></li>
+                <li><a href="/improve-website.html">Improve the Website</a></li>
+              </ul>
+            </li>
+
+            <li class="dropdown hidden-md hidden-sm">
+              <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Project <span class="caret"></span></a>
+              <ul class="dropdown-menu" role="menu">
+                <!-- Project -->
+                <li role="presentation" class="dropdown-header"><strong>Project</strong></li>
+                <li><a href="/material.html">Material</a></li>
+                <li><a href="https://twitter.com/apacheflink"><small><span class="glyphicon glyphicon-new-window"></span></small> Twitter</a></li>
+                <li><a href="https://github.com/apache/flink"><small><span class="glyphicon glyphicon-new-window"></span></small> GitHub</a></li>
+                <li><a href="https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home"><small><span class="glyphicon glyphicon-new-window"></span></small> Wiki</a></li>
+              </ul>
+            </li>
+          </ul>
+        </div><!-- /.navbar-collapse -->
+      </div><!-- /.container -->
+    </nav>
+
+
+    <!-- Main content. -->
+    <div class="container">
+      
+
+<div class="row">
+  <div class="col-sm-8 col-sm-offset-2">
+    <div class="row">
+      <h1>Introducing Stream Windows in Apache Flink</h1>
+
+      <article>
+        <p>04 Dec 2015 by Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>)</p>
+
+<p>The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache Flink is a production-ready stream processor with an easy-to-use yet very expressive API to define advanced stream analysis programs. Flink’s API features very flexible window definitions on data streams which let it stand out among other open source stream processors. </p>
+
+<p>In this blog post, we discuss the concept of windows for stream processing, present Flink’s built-in windows, and explain its support for custom windowing semantics.</p>
+
+<h2 id="what-are-windows-and-what-are-they-good-for">What are windows and what are they good for?</h2>
+
+<p>Consider the example of a traffic sensor that counts every 15 seconds the number of vehicles passing a certain location. The resulting stream could look like:</p>
+
+<center>
+<img src="/img/blog/window-intro/window-stream.png" style="width:75%;margin:15px" />
+</center>
+
+<p>If you would like to know, how many vehicles passed that location, you would simply sum the individual counts. However, the nature of a sensor stream is that it continuously produces data. Such a stream never ends and it is not possible to compute a final sum that can be returned. Instead, it is possible to compute rolling sums, i.e., return for each input event an updated sum record. This would yield a new stream of partial sums.</p>
+
+<center>
+<img src="/img/blog/window-intro/window-rolling-sum.png" style="width:75%;margin:15px" />
+</center>
+
+<p>However, a stream of partial sums might not be what we are looking for, because it constantly updates the count and even more important, some information such as variation over time is lost. Hence, we might want to rephrase our question and ask for the number of cars that pass the location every minute. This requires us to group the elements of the stream into finite sets, each set corresponding to sixty seconds. This operation is called a <em>tumbling windows</em> operation.</p>
+
+<center>
+<img src="/img/blog/window-intro/window-tumbling-window.png" style="width:75%;margin:15px" />
+</center>
+
+<p>Tumbling windows discretize a stream into non-overlapping windows. For certain applications it is important that windows are not disjunct because an application might require smoothed aggregates. For example, we can compute every thirty seconds the number of cars passed in the last minute. Such windows are called <em>sliding windows</em>.</p>
+
+<center>
+<img src="/img/blog/window-intro/window-sliding-window.png" style="width:75%;margin:15px" />
+</center>
+
+<p>Defining windows on a data stream as discussed before is a non-parallel operation. This is because each element of a stream must be processed by the same window operator that decides which windows the element should be added to. Windows on a full stream are called <em>AllWindows</em> in Flink. For many applications, a data stream needs to be grouped into multiple logical streams on each of which a window operator can be applied. Think for example about a stream of vehicle counts from multiple traffic sensors (instead of only one sensor as in our previous example), where each sensor monitors a different location. By grouping the stream by sensor id, we can compute windowed traffic statistics for each location in parallel. In Flink, we call such partitioned windows simply <em>Windows</em>, as they are the common case for distributed streams. The following figure shows tumbling windows that collect two elements over a stream of <code>(sensorId, count)</code> pair elements.</p>
+
+<center>
+<img src="/img/blog/window-intro/windows-keyed.png" style="width:75%;margin:15px" />
+</center>
+
+<p>Generally speaking, a window defines a finite set of elements on an unbounded stream. This set can be based on time (as in our previous examples), element counts, a combination of counts and time, or some custom logic to assign elements to windows. Flink’s DataStream API provides concise operators for the most common window operations as well as a generic windowing mechanism that allows users to define very custom windowing logic. In the following we present Flink’s time and count windows before discussing its windowing mechanism in detail.</p>
+
+<h2 id="time-windows">Time Windows</h2>
+
+<p>As their name suggests, time windows group stream elements by time. For example, a tumbling time window of one minute collects elements for one minute and applies a function on all elements in the window after one minute passed.</p>
+
+<p>Defining tumbling and sliding time windows in Apache Flink is very easy:</p>
+
+<div class="highlight"><pre><code class="language-scala"><span class="c1">// Stream of (sensorId, carCnt)</span>
+<span class="k">val</span> <span class="n">vehicleCnts</span><span class="k">:</span> <span class="kt">DataStream</span><span class="o">[(</span><span class="kt">Int</span>, <span class="kt">Int</span><span class="o">)]</span> <span class="k">=</span> <span class="o">...</span>
+
+<span class="k">val</span> <span class="n">tumblingCnts</span><span class="k">:</span> <span class="kt">DataStream</span><span class="o">[(</span><span class="kt">Int</span>, <span class="kt">Int</span><span class="o">)]</span> <span class="k">=</span> <span class="n">vehicleCnts</span>
+  <span class="c1">// key stream by sensorId</span>
+  <span class="o">.</span><span class="n">keyBy</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span> 
+  <span class="c1">// tumbling time window of 1 minute length</span>
+  <span class="o">.</span><span class="n">timeWindow</span><span class="o">(</span><span class="nc">Time</span><span class="o">.</span><span class="n">minutes</span><span class="o">(</span><span class="mi">1</span><span class="o">))</span>
+  <span class="c1">// compute sum over carCnt</span>
+  <span class="o">.</span><span class="n">sum</span><span class="o">(</span><span class="mi">1</span><span class="o">)</span> 
+
+<span class="k">val</span> <span class="n">slidingCnts</span><span class="k">:</span> <span class="kt">DataStream</span><span class="o">[(</span><span class="kt">Int</span>, <span class="kt">Int</span><span class="o">)]</span> <span class="k">=</span> <span class="n">vehicleCnts</span>
+  <span class="o">.</span><span class="n">keyBy</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span> 
+  <span class="c1">// sliding time window of 1 minute length and 30 secs trigger interval</span>
+  <span class="o">.</span><span class="n">timeWindow</span><span class="o">(</span><span class="nc">Time</span><span class="o">.</span><span class="n">minutes</span><span class="o">(</span><span class="mi">1</span><span class="o">),</span> <span class="nc">Time</span><span class="o">.</span><span class="n">seconds</span><span class="o">(</span><span class="mi">30</span><span class="o">))</span>
+  <span class="o">.</span><span class="n">sum</span><span class="o">(</span><span class="mi">1</span><span class="o">)</span></code></pre></div>
+
+<p>There is one aspect that we haven’t discussed yet, namely the exact meaning of “<em>collects elements for one minute</em>” which boils down to the question, “<em>How does the stream processor interpret time?</em>”.</p>
+
+<p>Apache Flink features three different notions of time, namely <em>processing time</em>, <em>event time</em>, and <em>ingestion time</em>. </p>
+
+<ol>
+  <li>In <strong>processing time</strong>, windows are defined with respect to the wall clock of the machine that builds and processes a window, i.e., a one minute processing time window collects elements for exactly one minute. </li>
+  <li>In <strong>event time</strong>, windows are defined with respect to timestamps that are attached to each event record. This is common for many types of events, such as log entries, sensor data, etc, where the timestamp usually represents the time at which the event occurred. Event time has several benefits over processing time. First of all, it decouples the program semantics from the actual serving speed of the source and the processing performance of system. Hence you can process historic data, which is served at maximum speed, and continuously produced data with the same program. It also prevents semantically incorrect results in case of backpressure or delays due to failure recovery. Second, event time windows compute correct results, even if events arrive out-of-order of their timestamp which is common if a data stream gathers events from distributed sources. </li>
+  <li><strong>Ingestion time</strong> is a hybrid of processing and event time. It assigns wall clock timestamps to records as soon as they arrive in the system (at the source) and continues processing with event time semantics based on the attached timestamps.</li>
+</ol>
+
+<h2 id="count-windows">Count Windows</h2>
+
+<p>Apache Flink also features count windows. A tumbling count window of 100 will collect 100 events in a window and evaluate the window when the 100th element has been added. </p>
+
+<p>In Flink’s DataStream API, tumbling and sliding count windows are defined as follows:</p>
+
+<div class="highlight"><pre><code class="language-scala"><span class="c1">// Stream of (sensorId, carCnt)</span>
+<span class="k">val</span> <span class="n">vehicleCnts</span><span class="k">:</span> <span class="kt">DataStream</span><span class="o">[(</span><span class="kt">Int</span>, <span class="kt">Int</span><span class="o">)]</span> <span class="k">=</span> <span class="o">...</span>
+
+<span class="k">val</span> <span class="n">tumblingCnts</span><span class="k">:</span> <span class="kt">DataStream</span><span class="o">[(</span><span class="kt">Int</span>, <span class="kt">Int</span><span class="o">)]</span> <span class="k">=</span> <span class="n">vehicleCnts</span>
+  <span class="c1">// key stream by sensorId</span>
+  <span class="o">.</span><span class="n">keyBy</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span>
+  <span class="c1">// tumbling count window of 100 elements size</span>
+  <span class="o">.</span><span class="n">countWindow</span><span class="o">(</span><span class="mi">100</span><span class="o">)</span>
+  <span class="c1">// compute the carCnt sum </span>
+  <span class="o">.</span><span class="n">sum</span><span class="o">(</span><span class="mi">1</span><span class="o">)</span>
+
+<span class="k">val</span> <span class="n">slidingCnts</span><span class="k">:</span> <span class="kt">DataStream</span><span class="o">[(</span><span class="kt">Int</span>, <span class="kt">Int</span><span class="o">)]</span> <span class="k">=</span> <span class="n">vehicleCnts</span>
+  <span class="o">.</span><span class="n">keyBy</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span>
+  <span class="c1">// sliding count window of 100 elements size and 10 elements trigger interval</span>
+  <span class="o">.</span><span class="n">countWindow</span><span class="o">(</span><span class="mi">100</span><span class="o">,</span> <span class="mi">10</span><span class="o">)</span>
+  <span class="o">.</span><span class="n">sum</span><span class="o">(</span><span class="mi">1</span><span class="o">)</span></code></pre></div>
+
+<h2 id="dissecting-flinks-windowing-mechanics">Dissecting Flink’s windowing mechanics</h2>
+
+<p>Flink’s built-in time and count windows cover a wide range of common window use cases. However, there are of course applications that require custom windowing logic that cannot be addressed by Flink’s built-in windows. In order to support also applications that need very specific windowing semantics, the DataStream API exposes interfaces for the internals of its windowing mechanics. These interfaces give very fine-grained control about the way that windows are built and evaluated. </p>
+
+<p>The following figure depicts Flink’s windowing mechanism and introduces the components being involved.</p>
+
+<center>
+<img src="/img/blog/window-intro/window-mechanics.png" style="width:90%;margin:15px" />
+</center>
+
+<p>Elements that arrive at a window operator are handed to a <code>WindowAssigner</code>. The WindowAssigner assigns elements to one or more windows, possibly creating new windows. A <code>Window</code> itself is just an identifier for a list of elements and may provide some optional meta information, such as begin and end time in case of a <code>TimeWindow</code>. Note that an element can be added to multiple windows, which also means that multiple windows can exist at the same time.</p>
+
+<p>Each window owns a <code>Trigger</code> that decides when the window is evaluated or purged. The trigger is called for each element that is inserted into the window and when a previously registered timer times out. On each event, a trigger can decide to fire (i.e., evaluate), purge (remove the window and discard its content), or fire and then purge the window. A trigger that just fires evaluates the window and keeps it as it is, i.e., all elements remain in the window and are evaluated again when the triggers fires the next time. A window can be evaluated several times and exists until it is purged. Note that a window consumes memory until it is purged.</p>
+
+<p>When a Trigger fires, the list of window elements can be given to an optional <code>Evictor</code>. The evictor can iterate through the list and decide to cut off some elements from the start of the list, i.e., remove some of the elements that entered the window first. The remaining elements are given to an evaluation function. If no Evictor was defined, the Trigger hands all the window elements directly to the evaluation function.</p>
+
+<p>The evaluation function receives the elements of a window (possibly filtered by an Evictor) and computes one or more result elements for the window. The DataStream API accepts different types of evaluation functions, including predefined aggregation functions such as <code>sum()</code>, <code>min()</code>, <code>max()</code>, as well as a <code>ReduceFunction</code>, <code>FoldFunction</code>, or <code>WindowFunction</code>. A WindowFunction is the most generic evaluation function and receives the window object (i.e, the meta data of the window), the list of window elements, and the window key (in case of a keyed window) as parameters.</p>
+
+<p>These are the components that constitute Flink’s windowing mechanics. We now show step-by-step how to implement custom windowing logic with the DataStream API. We start with a stream of type <code>DataStream[IN]</code> and key it using a key selector function that extracts a key of type <code>KEY</code> to obtain a <code>KeyedStream[IN, KEY]</code>.</p>
+
+<div class="highlight"><pre><code class="language-scala"><span class="k">val</span> <span class="n">input</span><span class="k">:</span> <span class="kt">DataStream</span><span class="o">[</span><span class="kt">IN</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
+
+<span class="c1">// created a keyed stream using a key selector function</span>
+<span class="k">val</span> <span class="n">keyed</span><span class="k">:</span> <span class="kt">KeyedStream</span><span class="o">[</span><span class="kt">IN</span>, <span class="kt">KEY</span><span class="o">]</span> <span class="k">=</span> <span class="n">input</span>
+  <span class="o">.</span><span class="n">keyBy</span><span class="o">(</span><span class="n">myKeySel</span><span class="k">:</span> <span class="o">(</span><span class="kt">IN</span><span class="o">)</span> <span class="o">=&gt;</span> <span class="nc">KEY</span><span class="o">)</span></code></pre></div>
+
+<p>We apply a <code>WindowAssigner[IN, WINDOW]</code> that creates windows of type <code>WINDOW</code> resulting in a <code>WindowedStream[IN, KEY, WINDOW]</code>. In addition, a <code>WindowAssigner</code> also provides a default <code>Trigger</code> implementation.</p>
+
+<div class="highlight"><pre><code class="language-scala"><span class="c1">// create windowed stream using a WindowAssigner</span>
+<span class="k">var</span> <span class="n">windowed</span><span class="k">:</span> <span class="kt">WindowedStream</span><span class="o">[</span><span class="kt">IN</span>, <span class="kt">KEY</span>, <span class="kt">WINDOW</span><span class="o">]</span> <span class="k">=</span> <span class="n">keyed</span>
+  <span class="o">.</span><span class="n">window</span><span class="o">(</span><span class="n">myAssigner</span><span class="k">:</span> <span class="kt">WindowAssigner</span><span class="o">[</span><span class="kt">IN</span>, <span class="kt">WINDOW</span><span class="o">])</span></code></pre></div>
+
+<p>We can explicitly specify a <code>Trigger</code> to overwrite the default <code>Trigger</code> provided by the <code>WindowAssigner</code>. Note that specifying a triggers does not add an additional trigger condition but replaces the current trigger.</p>
+
+<div class="highlight"><pre><code class="language-scala"><span class="c1">// override the default trigger of the WindowAssigner</span>
+<span class="n">windowed</span> <span class="k">=</span> <span class="n">windowed</span>
+  <span class="o">.</span><span class="n">trigger</span><span class="o">(</span><span class="n">myTrigger</span><span class="k">:</span> <span class="kt">Trigger</span><span class="o">[</span><span class="kt">IN</span>, <span class="kt">WINDOW</span><span class="o">])</span></code></pre></div>
+
+<p>We may want to specify an optional <code>Evictor</code> as follows.</p>
+
+<div class="highlight"><pre><code class="language-scala"><span class="c1">// specify an optional evictor</span>
+<span class="n">windowed</span> <span class="k">=</span> <span class="n">windowed</span>
+  <span class="o">.</span><span class="n">evictor</span><span class="o">(</span><span class="n">myEvictor</span><span class="k">:</span> <span class="kt">Evictor</span><span class="o">[</span><span class="kt">IN</span>, <span class="kt">WINDOW</span><span class="o">])</span></code></pre></div>
+
+<p>Finally, we apply a <code>WindowFunction</code> that returns elements of type <code>OUT</code> to obtain a <code>DataStream[OUT]</code>.</p>
+
+<div class="highlight"><pre><code class="language-scala"><span class="c1">// apply window function to windowed stream</span>
+<span class="k">val</span> <span class="n">output</span><span class="k">:</span> <span class="kt">DataStream</span><span class="o">[</span><span class="kt">OUT</span><span class="o">]</span> <span class="k">=</span> <span class="n">windowed</span>
+  <span class="o">.</span><span class="n">apply</span><span class="o">(</span><span class="n">myWinFunc</span><span class="k">:</span> <span class="kt">WindowFunction</span><span class="o">[</span><span class="kt">IN</span>, <span class="kt">OUT</span>, <span class="kt">KEY</span>, <span class="kt">WINDOW</span><span class="o">])</span></code></pre></div>
+
+<p>With Flink’s internal windowing mechanics and its exposure through the DataStream API it is possible to implement very custom windowing logic such as session windows or windows that emit early results if the values exceed a certain threshold.</p>
+
+<h2 id="conclusion">Conclusion</h2>
+
+<p>Support for various types of windows over continuous data streams is a must-have for modern stream processors. Apache Flink is a stream processor with a very strong feature set, including a very flexible mechanism to build and evaluate windows over continuous data streams. Flink provides pre-defined window operators for common uses cases as well as a toolbox that allows to define very custom windowing logic. The Flink community will add more pre-defined window operators as we learn the requirements from our users.</p>
+
+      </article>
+    </div>
+
+    <div class="row">
+      <div id="disqus_thread"></div>
+      <script type="text/javascript">
+        /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
+        var disqus_shortname = 'stratosphere-eu'; // required: replace example with your forum shortname
+
+        /* * * DON'T EDIT BELOW THIS LINE * * */
+        (function() {
+            var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+            dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
+             (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+        })();
+      </script>
+    </div>
+  </div>
+</div>
+
+      <hr />
+      <div class="footer text-center">
+        <p>Copyright © 2014-2015 <a href="http://apache.org">The Apache Software Foundation</a>. All Rights Reserved.</p>
+        <p>Apache Flink, Apache, and the Apache feather logo are trademarks of The Apache Software Foundation.</p>
+        <p><a href="/privacy-policy.html">Privacy Policy</a> &middot; <a href="/blog/feed.xml">RSS feed</a></p>
+      </div>
+
+    </div><!-- /.container -->
+
+    <!-- Include all compiled plugins (below), or include individual files as needed -->
+    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
+    <script src="/js/codetabs.js"></script>
+
+    <!-- Google Analytics -->
+    <script>
+      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+      ga('create', 'UA-52545728-1', 'auto');
+      ga('send', 'pageview');
+    </script>
+  </body>
+</html>


Mime
View raw message