flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From trohrm...@apache.org
Subject [6/7] flink-web git commit: Add CEP blog post
Date Wed, 06 Apr 2016 12:38:45 GMT
http://git-wip-us.apache.org/repos/asf/flink-web/blob/ebaf2975/content/blog/feed.xml
----------------------------------------------------------------------
diff --git a/content/blog/feed.xml b/content/blog/feed.xml
index 4b95729..f61648e 100644
--- a/content/blog/feed.xml
+++ b/content/blog/feed.xml
@@ -7,6 +7,195 @@
 <atom:link href="http://flink.apache.org/blog/feed.xml" rel="self" type="application/rss+xml" />
 
 <item>
+<title>Introducing Complex Event Processing (CEP) with Apache Flink</title>
+<description>&lt;p&gt;With the ubiquity of sensor networks and smart devices continuously collecting more and more data, we face the challenge to analyze an ever growing stream of data in near real-time. 
+Being able to react quickly to changing trends or to deliver up to date business intelligence can be a decisive factor for a company’s success or failure. 
+A key problem in real time processing is the detection of event patterns in data streams.&lt;/p&gt;
+
+&lt;p&gt;Complex event processing (CEP) addresses exactly this problem of matching continuously incoming events against a pattern. 
+The result of a matching are usually complex events which are derived from the input events. 
+In contrast to traditional DBMSs where a query is executed on stored data, CEP executes data on a stored query. 
+All data which is not relevant for the query can be immediately discarded. 
+The advantages of this approach are obvious, given that CEP queries are applied on a potentially infinite stream of data. 
+Furthermore, inputs are processed immediately. 
+Once the system has seen all events for a matching sequence, results are emitted straight away. 
+This aspect effectively leads to CEP’s real time analytics capability.&lt;/p&gt;
+
+&lt;p&gt;Consequently, CEP’s processing paradigm drew significant interest and found application in a wide variety of use cases. 
+Most notably, CEP is used nowadays for financial applications such as stock market trend and credit card fraud detection. 
+Moreover, it is used in RFID-based tracking and monitoring, for example, to detect thefts in a warehouse where items are not properly checked out. 
+CEP can also be used to detect network intrusion by specifying patterns of suspicious user behaviour.&lt;/p&gt;
+
+&lt;p&gt;Apache Flink with its true streaming nature and its capabilities for low latency as well as high throughput stream processing is a natural fit for CEP workloads. 
+Consequently, the Flink community has introduced the first version of a new &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html&quot;&gt;CEP library&lt;/a&gt; with &lt;a href=&quot;http://flink.apache.org/news/2016/03/08/release-1.0.0.html&quot;&gt;Flink 1.0&lt;/a&gt;. 
+In the remainder of this blog post, we introduce Flink’s CEP library and we illustrate its ease of use through the example of monitoring a data center.&lt;/p&gt;
+
+&lt;h2 id=&quot;monitoring-and-alert-generation-for-data-centers&quot;&gt;Monitoring and alert generation for data centers&lt;/h2&gt;
+
+&lt;center&gt;
+&lt;img src=&quot;/img/blog/cep-monitoring.svg&quot; style=&quot;width:600px;margin:15px&quot; /&gt;
+&lt;/center&gt;
+
+&lt;p&gt;Assume we have a data center with a number of racks. 
+For each rack the power consumption and the temperature are monitored. 
+Whenever such a measurement takes place, a new power or temperature event is generated, respectively. 
+Based on this monitoring event stream, we want to detect racks that are about to overheat, and dynamically adapt their workload and cooling.&lt;/p&gt;
+
+&lt;p&gt;For this scenario we use a two staged approach. 
+First, we monitor the temperature events. 
+Whenever we see two consecutive events whose temperature exceeds a threshold value, we generate a temperature warning with the current average temperature. 
+A temperature warning does not necessarily indicate that a rack is about to overheat. 
+But whenever we see two consecutive warnings with increasing temperatures, then we want to issue an alert for this rack. 
+This alert can then lead to countermeasures to cool the rack.&lt;/p&gt;
+
+&lt;h3 id=&quot;implementation-with-apache-flink&quot;&gt;Implementation with Apache Flink&lt;/h3&gt;
+
+&lt;p&gt;First, we define the messages of the incoming monitoring event stream. 
+Every monitoring message contains its originating rack ID. 
+The temperature event additionally contains the current temperature and the power consumption event contains the current voltage. 
+We model the events as POJOs:&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;abstract&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MonitoringEvent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+
+&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TemperatureEvent&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;temperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+
+&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PowerEvent&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;voltage&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;Now we can ingest the monitoring event stream using one of Flink’s connectors (e.g. Kafka, RabbitMQ, etc.). 
+This will give us a &lt;code&gt;DataStream&amp;lt;MonitoringEvent&amp;gt; inputEventStream&lt;/code&gt; which we will use as the input for Flink’s CEP operator. 
+But first, we have to define the event pattern to detect temperature warnings. 
+The CEP library offers an intuitive &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html#the-pattern-api&quot;&gt;Pattern API&lt;/a&gt; to easily define these complex patterns.&lt;/p&gt;
+
+&lt;p&gt;Every pattern consists of a sequence of events which can have optional filter conditions assigned. 
+A pattern always starts with a first event to which we will assign the name &lt;code&gt;“First Event”&lt;/code&gt;.&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;This pattern will match every monitoring event. 
+Since we are only interested in &lt;code&gt;TemperatureEvents&lt;/code&gt; whose temperature is above a threshold value, we have to add an additional subtype constraint and a where clause:&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TEMPERATURE_THRESHOLD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;As stated before, we want to generate a &lt;code&gt;TemperatureWarning&lt;/code&gt; if and only if we see two consecutive &lt;code&gt;TemperatureEvents&lt;/code&gt; for the same rack whose temperatures are too high. 
+The Pattern API offers the &lt;code&gt;next&lt;/code&gt; call which allows us to add a new event to our pattern. 
+This event has to follow directly the first matching event in order for the whole pattern to match.&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;warningPattern&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TEMPERATURE_THRESHOLD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TEMPERATURE_THRESHOLD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;within&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;The final pattern definition also contains the &lt;code&gt;within&lt;/code&gt; API call which defines that two consecutive &lt;code&gt;TemperatureEvents&lt;/code&gt; have to occur within a time interval of 10 seconds for the pattern to match. 
+Depending on the time characteristic setting, this can either be processing, ingestion or event time.&lt;/p&gt;
+
+&lt;p&gt;Having defined the event pattern, we can now apply it on the &lt;code&gt;inputEventStream&lt;/code&gt;.&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;PatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempPatternStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CEP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
+    &lt;span class=&quot;n&quot;&gt;inputEventStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;rackID&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
+    &lt;span class=&quot;n&quot;&gt;warningPattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;Since we want to generate our warnings for each rack individually, we &lt;code&gt;keyBy&lt;/code&gt; the input event stream by the &lt;code&gt;“rackID”&lt;/code&gt; POJO field. 
+This enforces that matching events of our pattern will all have the same rack ID.&lt;/p&gt;
+
+&lt;p&gt;The &lt;code&gt;PatternStream&amp;lt;MonitoringEvent&amp;gt;&lt;/code&gt; gives us access to successfully matched event sequences. 
+They can be accessed using the &lt;code&gt;select&lt;/code&gt; API call. 
+The &lt;code&gt;select&lt;/code&gt; API call takes a &lt;code&gt;PatternSelectFunction&lt;/code&gt; which is called for every matching event sequence. 
+The event sequence is provided as a &lt;code&gt;Map&amp;lt;String, MonitoringEvent&amp;gt;&lt;/code&gt; where each &lt;code&gt;MonitoringEvent&lt;/code&gt; is identified by its assigned event name. 
+Our pattern select function generates for each matching pattern a &lt;code&gt;TemperatureWarning&lt;/code&gt; event.&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TemperatureWarning&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
+    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;averageTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+
+&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;warnings&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempPatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+        &lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;first&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
+        &lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
+
+        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
+            &lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; 
+            &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;Now we have generated a new complex event stream &lt;code&gt;DataStream&amp;lt;TemperatureWarning&amp;gt; warnings&lt;/code&gt; from the initial monitoring event stream. 
+This complex event stream can again be used as the input for another round of complex event processing. 
+We use the &lt;code&gt;TemperatureWarnings&lt;/code&gt; to generate &lt;code&gt;TemperatureAlerts&lt;/code&gt; whenever we see two consecutive &lt;code&gt;TemperatureWarnings&lt;/code&gt; for the same rack with increasing temperatures. 
+The &lt;code&gt;TemperatureAlerts&lt;/code&gt; have the following definition:&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TemperatureAlert&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
+&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;At first, we have to define our alert event pattern:&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alertPattern&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;within&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;This definition says that we want to see two &lt;code&gt;TemperatureWarnings&lt;/code&gt; within 20 seconds. 
+The first event has the name &lt;code&gt;“First Event”&lt;/code&gt; and the second consecutive event has the name &lt;code&gt;“Second Event”&lt;/code&gt;. 
+The individual events don’t have a where clause assigned, because we need access to both events in order to decide whether the temperature is increasing. 
+Therefore, we apply the filter condition in the select clause. 
+But first, we obtain again a &lt;code&gt;PatternStream&lt;/code&gt;.&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;PatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alertPatternStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CEP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
+    &lt;span class=&quot;n&quot;&gt;warnings&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;rackID&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
+    &lt;span class=&quot;n&quot;&gt;alertPattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;Again, we &lt;code&gt;keyBy&lt;/code&gt; the warnings input stream by the &lt;code&gt;&quot;rackID&quot;&lt;/code&gt; so that we generate our alerts for each rack individually. 
+Next we apply the &lt;code&gt;flatSelect&lt;/code&gt; method which will give us access to matching event sequences and allows us to output an arbitrary number of complex events. 
+Thus, we will only generate a &lt;code&gt;TemperatureAlert&lt;/code&gt; if and only if the temperature is increasing.&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureAlert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alerts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alertPatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatSelect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureAlert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+        &lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;first&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
+        &lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
+
+        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getAverageTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getAverageTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
+            &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;TemperatureAlert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;
+        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
+    &lt;span class=&quot;o&quot;&gt;});&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;The &lt;code&gt;DataStream&amp;lt;TemperatureAlert&amp;gt; alerts&lt;/code&gt; is the data stream of temperature alerts for each rack. 
+Based on these alerts we can now adapt the workload or cooling for overheating racks.&lt;/p&gt;
+
+&lt;p&gt;The full source code for the presented example as well as an example data source which generates randomly monitoring events can be found in &lt;a href=&quot;https://github.com/tillrohrmann/cep-monitoring&quot;&gt;this repository&lt;/a&gt;.&lt;/p&gt;
+
+&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
+
+&lt;p&gt;In this blog post we have seen how easy it is to reason about event streams using Flink’s CEP library. 
+Using the example of monitoring and alert generation for a data center, we have implemented a short program which notifies us when a rack is about to overheat and potentially to fail.&lt;/p&gt;
+
+&lt;p&gt;In the future, the Flink community will further extend the CEP library’s functionality and expressiveness. 
+Next on the road map is support for a regular expression-like pattern specification, including Kleene star, lower and upper bounds, and negation. 
+Furthermore, it is planned to allow the where-clause to access fields of previously matched events. 
+This feature will allow to prune unpromising event sequences early.&lt;/p&gt;
+
+</description>
+<pubDate>Wed, 06 Apr 2016 12:00:00 +0200</pubDate>
+<link>http://flink.apache.org/news/2016/04/06/cep-monitoring.html</link>
+<guid isPermaLink="true">/news/2016/04/06/cep-monitoring.html</guid>
+</item>
+
+<item>
 <title>Flink 1.0.1 Released</title>
 <description>&lt;p&gt;Today, the Flink community released Flink version &lt;strong&gt;1.0.1&lt;/strong&gt;, the first bugfix release of the 1.0 series.&lt;/p&gt;
 
@@ -81,7 +270,7 @@
 
 <item>
 <title>Announcing Apache Flink 1.0.0</title>
-<description>&lt;p&gt;The Apache Flink community is pleased to announce the availability of the 1.0.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on improving the experience of writing and executing data stream processing pipelines in production.&lt;/p&gt;
+<description>&lt;p&gt;The Apache Flink community is pleased to announce the availability of the 1.0.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on improving the experience of writing and executing data stream processing pipelines in production. &lt;/p&gt;
 
 &lt;center&gt;
 &lt;img src=&quot;/img/blog/flink-1.0.png&quot; style=&quot;height:200px;margin:15px&quot; /&gt;
@@ -122,7 +311,7 @@ When using this backend, active state in streaming programs can grow well beyond
 
 &lt;p&gt;The checkpointing has been extended by a more fine-grained control mechanism: In previous versions, new checkpoints were triggered independent of the speed at which old checkpoints completed. This can lead to situations where new checkpoints are piling up, because they are triggered too frequently.&lt;/p&gt;
 
-&lt;p&gt;The checkpoint coordinator now exposes statistics through our REST monitoring API and the web interface. Users can review the checkpoint size and duration on a per-operator basis and see the last completed checkpoints. This is helpful for identifying performance issues, such as processing slowdown by the checkpoints.&lt;/p&gt;
+&lt;p&gt;The checkpoint coordinator now exposes statistics through our REST monitoring API and the web interface. Users can review the checkpoint size and duration on a per-operator basis and see the last completed checkpoints. This is helpful for identifying performance issues, such as processing slowdown by the checkpoints. &lt;/p&gt;
 
 &lt;h2 id=&quot;improved-kafka-connector-and-support-for-kafka-09&quot;&gt;Improved Kafka connector and support for Kafka 0.9&lt;/h2&gt;
 
@@ -616,7 +805,7 @@ While you can embed Spouts/Bolts in a Flink program and mix-and-match them with
 
 <item>
 <title>Introducing Stream Windows in Apache Flink</title>
-<description>&lt;p&gt;The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache Flink is a production-ready stream processor with an easy-to-use yet very expressive API to define advanced stream analysis programs. Flink’s API features very flexible window definitions on data streams which let it stand out among other open source stream processors.&lt;/p&gt;
+<description>&lt;p&gt;The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache Flink is a production-ready stream processor with an easy-to-use yet very expressive API to define advanced stream analysis programs. Flink’s API features very flexible window definitions on data streams which let it stand out among other open source stream processors. &lt;/p&gt;
 
 &lt;p&gt;In this blog post, we discuss the concept of windows for stream processing, present Flink’s built-in windows, and explain its support for custom windowing semantics.&lt;/p&gt;
 
@@ -679,17 +868,17 @@ While you can embed Spouts/Bolts in a Flink program and mix-and-match them with
 
 &lt;p&gt;There is one aspect that we haven’t discussed yet, namely the exact meaning of “&lt;em&gt;collects elements for one minute&lt;/em&gt;” which boils down to the question, “&lt;em&gt;How does the stream processor interpret time?&lt;/em&gt;”.&lt;/p&gt;
 
-&lt;p&gt;Apache Flink features three different notions of time, namely &lt;em&gt;processing time&lt;/em&gt;, &lt;em&gt;event time&lt;/em&gt;, and &lt;em&gt;ingestion time&lt;/em&gt;.&lt;/p&gt;
+&lt;p&gt;Apache Flink features three different notions of time, namely &lt;em&gt;processing time&lt;/em&gt;, &lt;em&gt;event time&lt;/em&gt;, and &lt;em&gt;ingestion time&lt;/em&gt;. &lt;/p&gt;
 
 &lt;ol&gt;
-  &lt;li&gt;In &lt;strong&gt;processing time&lt;/strong&gt;, windows are defined with respect to the wall clock of the machine that builds and processes a window, i.e., a one minute processing time window collects elements for exactly one minute.&lt;/li&gt;
-  &lt;li&gt;In &lt;strong&gt;event time&lt;/strong&gt;, windows are defined with respect to timestamps that are attached to each event record. This is common for many types of events, such as log entries, sensor data, etc, where the timestamp usually represents the time at which the event occurred. Event time has several benefits over processing time. First of all, it decouples the program semantics from the actual serving speed of the source and the processing performance of system. Hence you can process historic data, which is served at maximum speed, and continuously produced data with the same program. It also prevents semantically incorrect results in case of backpressure or delays due to failure recovery. Second, event time windows compute correct results, even if events arrive out-of-order of their timestamp which is common if a data stream gathers events from distributed sources.&lt;/li&gt;
+  &lt;li&gt;In &lt;strong&gt;processing time&lt;/strong&gt;, windows are defined with respect to the wall clock of the machine that builds and processes a window, i.e., a one minute processing time window collects elements for exactly one minute. &lt;/li&gt;
+  &lt;li&gt;In &lt;strong&gt;event time&lt;/strong&gt;, windows are defined with respect to timestamps that are attached to each event record. This is common for many types of events, such as log entries, sensor data, etc, where the timestamp usually represents the time at which the event occurred. Event time has several benefits over processing time. First of all, it decouples the program semantics from the actual serving speed of the source and the processing performance of system. Hence you can process historic data, which is served at maximum speed, and continuously produced data with the same program. It also prevents semantically incorrect results in case of backpressure or delays due to failure recovery. Second, event time windows compute correct results, even if events arrive out-of-order of their timestamp which is common if a data stream gathers events from distributed sources. &lt;/li&gt;
   &lt;li&gt;&lt;strong&gt;Ingestion time&lt;/strong&gt; is a hybrid of processing and event time. It assigns wall clock timestamps to records as soon as they arrive in the system (at the source) and continues processing with event time semantics based on the attached timestamps.&lt;/li&gt;
 &lt;/ol&gt;
 
 &lt;h2 id=&quot;count-windows&quot;&gt;Count Windows&lt;/h2&gt;
 
-&lt;p&gt;Apache Flink also features count windows. A tumbling count window of 100 will collect 100 events in a window and evaluate the window when the 100th element has been added.&lt;/p&gt;
+&lt;p&gt;Apache Flink also features count windows. A tumbling count window of 100 will collect 100 events in a window and evaluate the window when the 100th element has been added. &lt;/p&gt;
 
 &lt;p&gt;In Flink’s DataStream API, tumbling and sliding count windows are defined as follows:&lt;/p&gt;
 
@@ -712,7 +901,7 @@ While you can embed Spouts/Bolts in a Flink program and mix-and-match them with
 
 &lt;h2 id=&quot;dissecting-flinks-windowing-mechanics&quot;&gt;Dissecting Flink’s windowing mechanics&lt;/h2&gt;
 
-&lt;p&gt;Flink’s built-in time and count windows cover a wide range of common window use cases. However, there are of course applications that require custom windowing logic that cannot be addressed by Flink’s built-in windows. In order to support also applications that need very specific windowing semantics, the DataStream API exposes interfaces for the internals of its windowing mechanics. These interfaces give very fine-grained control about the way that windows are built and evaluated.&lt;/p&gt;
+&lt;p&gt;Flink’s built-in time and count windows cover a wide range of common window use cases. However, there are of course applications that require custom windowing logic that cannot be addressed by Flink’s built-in windows. In order to support also applications that need very specific windowing semantics, the DataStream API exposes interfaces for the internals of its windowing mechanics. These interfaces give very fine-grained control about the way that windows are built and evaluated. &lt;/p&gt;
 
 &lt;p&gt;The following figure depicts Flink’s windowing mechanism and introduces the components being involved.&lt;/p&gt;
 
@@ -834,7 +1023,7 @@ While you can embed Spouts/Bolts in a Flink program and mix-and-match them with
 <title>Announcing Apache Flink 0.10.0</title>
 <description>&lt;p&gt;The Apache Flink community is pleased to announce the availability of the 0.10.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on data stream processing and operational features. About 80 contributors provided bug fixes, improvements, and new features such that in total more than 400 JIRA issues could be resolved.&lt;/p&gt;
 
-&lt;p&gt;For Flink 0.10.0, the focus of the community was to graduate the DataStream API from beta and to evolve Apache Flink into a production-ready stream data processor with a competitive feature set. These efforts resulted in support for event-time and out-of-order streams, exactly-once guarantees in the case of failures, a very flexible windowing mechanism, sophisticated operator state management, and a highly-available cluster operation mode. Flink 0.10.0 also brings a new monitoring dashboard with real-time system and job monitoring capabilities. Both batch and streaming modes of Flink benefit from the new high availability and improved monitoring features. Needless to say that Flink 0.10.0 includes many more features, improvements, and bug fixes.&lt;/p&gt;
+&lt;p&gt;For Flink 0.10.0, the focus of the community was to graduate the DataStream API from beta and to evolve Apache Flink into a production-ready stream data processor with a competitive feature set. These efforts resulted in support for event-time and out-of-order streams, exactly-once guarantees in the case of failures, a very flexible windowing mechanism, sophisticated operator state management, and a highly-available cluster operation mode. Flink 0.10.0 also brings a new monitoring dashboard with real-time system and job monitoring capabilities. Both batch and streaming modes of Flink benefit from the new high availability and improved monitoring features. Needless to say that Flink 0.10.0 includes many more features, improvements, and bug fixes. &lt;/p&gt;
 
 &lt;p&gt;We encourage everyone to &lt;a href=&quot;/downloads.html&quot;&gt;download the release&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-0.10/&quot;&gt;check out the documentation&lt;/a&gt;. Feedback through the Flink &lt;a href=&quot;/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; is, as always, very welcome!&lt;/p&gt;
 
@@ -1053,7 +1242,7 @@ Also note that some methods in the DataStream API had to be renamed as part of t
 
 &lt;h2 id=&quot;the-off-heap-memory-implementation&quot;&gt;The off-heap Memory Implementation&lt;/h2&gt;
 
-&lt;p&gt;Given that all memory intensive internal algorithms are already implemented against the &lt;code&gt;MemorySegment&lt;/code&gt;, our implementation to switch to off-heap memory is actually trivial. You can compare it to replacing all &lt;code&gt;ByteBuffer.allocate(numBytes)&lt;/code&gt; calls with &lt;code&gt;ByteBuffer.allocateDirect(numBytes)&lt;/code&gt;. In Flink’s case it meant that we made the &lt;code&gt;MemorySegment&lt;/code&gt; abstract and added the &lt;code&gt;HeapMemorySegment&lt;/code&gt; and &lt;code&gt;OffHeapMemorySegment&lt;/code&gt; subclasses. The &lt;code&gt;OffHeapMemorySegment&lt;/code&gt; takes the off-heap memory pointer from a &lt;code&gt;java.nio.DirectByteBuffer&lt;/code&gt; and implements its specialized access methods using &lt;code&gt;sun.misc.Unsafe&lt;/code&gt;. We also made a few adjustments to the startup scripts and the deployment code to make sure that the JVM is permitted enough off-heap memory (direct memory, &lt;em&gt;-XX:MaxDirectM
 emorySize&lt;/em&gt;).&lt;/p&gt;
+&lt;p&gt;Given that all memory intensive internal algorithms are already implemented against the &lt;code&gt;MemorySegment&lt;/code&gt;, our implementation to switch to off-heap memory is actually trivial. You can compare it to replacing all &lt;code&gt;ByteBuffer.allocate(numBytes)&lt;/code&gt; calls with &lt;code&gt;ByteBuffer.allocateDirect(numBytes)&lt;/code&gt;. In Flink’s case it meant that we made the &lt;code&gt;MemorySegment&lt;/code&gt; abstract and added the &lt;code&gt;HeapMemorySegment&lt;/code&gt; and &lt;code&gt;OffHeapMemorySegment&lt;/code&gt; subclasses. The &lt;code&gt;OffHeapMemorySegment&lt;/code&gt; takes the off-heap memory pointer from a &lt;code&gt;java.nio.DirectByteBuffer&lt;/code&gt; and implements its specialized access methods using &lt;code&gt;sun.misc.Unsafe&lt;/code&gt;. We also made a few adjustments to the startup scripts and the deployment code to make sure that the JVM is permitted enough off-heap memory (direct memory, &lt;em&gt;-XX:MaxDirectM
 emorySize&lt;/em&gt;). &lt;/p&gt;
 
 &lt;p&gt;In practice we had to go one step further, to make the implementation perform well. While the &lt;code&gt;ByteBuffer&lt;/code&gt; is used in I/O code paths to compose headers and move bulk memory into place, the MemorySegment is part of the innermost loops of many algorithms (sorting, hash tables, …). That means that the access methods have to be as fast as possible.&lt;/p&gt;
 
@@ -2078,21 +2267,21 @@ and mutations as well as neighborhood aggregations.&lt;/p&gt;
 
 &lt;h4 id=&quot;common-graph-metrics&quot;&gt;Common Graph Metrics&lt;/h4&gt;
 &lt;p&gt;These methods can be used to retrieve several graph metrics and properties, such as the number
-of vertices, edges and the node degrees.&lt;/p&gt;
+of vertices, edges and the node degrees. &lt;/p&gt;
 
 &lt;h4 id=&quot;transformations&quot;&gt;Transformations&lt;/h4&gt;
 &lt;p&gt;The transformation methods enable several Graph operations, using high-level functions similar to
 the ones provided by the batch processing API. These transformations can be applied one after the
-other, yielding a new Graph after each step, in a fashion similar to operators on DataSets:&lt;/p&gt;
+other, yielding a new Graph after each step, in a fashion similar to operators on DataSets: &lt;/p&gt;
 
 &lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;inputGraph&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getUndirected&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;mapEdges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;CustomEdgeMapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
 
 &lt;p&gt;Transformations can be applied on:&lt;/p&gt;
 
 &lt;ol&gt;
-  &lt;li&gt;&lt;strong&gt;Vertices&lt;/strong&gt;: &lt;code&gt;mapVertices&lt;/code&gt;, &lt;code&gt;joinWithVertices&lt;/code&gt;, &lt;code&gt;filterOnVertices&lt;/code&gt;, &lt;code&gt;addVertex&lt;/code&gt;, …&lt;/li&gt;
-  &lt;li&gt;&lt;strong&gt;Edges&lt;/strong&gt;: &lt;code&gt;mapEdges&lt;/code&gt;, &lt;code&gt;filterOnEdges&lt;/code&gt;, &lt;code&gt;removeEdge&lt;/code&gt;, …&lt;/li&gt;
-  &lt;li&gt;&lt;strong&gt;Triplets&lt;/strong&gt; (source vertex, target vertex, edge): &lt;code&gt;getTriplets&lt;/code&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Vertices&lt;/strong&gt;: &lt;code&gt;mapVertices&lt;/code&gt;, &lt;code&gt;joinWithVertices&lt;/code&gt;, &lt;code&gt;filterOnVertices&lt;/code&gt;, &lt;code&gt;addVertex&lt;/code&gt;, …  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Edges&lt;/strong&gt;: &lt;code&gt;mapEdges&lt;/code&gt;, &lt;code&gt;filterOnEdges&lt;/code&gt;, &lt;code&gt;removeEdge&lt;/code&gt;, …   &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Triplets&lt;/strong&gt; (source vertex, target vertex, edge): &lt;code&gt;getTriplets&lt;/code&gt;  &lt;/li&gt;
 &lt;/ol&gt;
 
 &lt;h4 id=&quot;neighborhood-aggregations&quot;&gt;Neighborhood Aggregations&lt;/h4&gt;
@@ -2226,7 +2415,7 @@ vertex values do not need to be recomputed during an iteration.&lt;/p&gt;
 &lt;p&gt;Let us reconsider the Single Source Shortest Paths algorithm. In each iteration, a vertex:&lt;/p&gt;
 
 &lt;ol&gt;
-  &lt;li&gt;&lt;strong&gt;Gather&lt;/strong&gt; retrieves distances from its neighbors summed up with the corresponding edge values;&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Gather&lt;/strong&gt; retrieves distances from its neighbors summed up with the corresponding edge values; &lt;/li&gt;
   &lt;li&gt;&lt;strong&gt;Sum&lt;/strong&gt; compares the newly obtained distances in order to extract the minimum;&lt;/li&gt;
   &lt;li&gt;&lt;strong&gt;Apply&lt;/strong&gt; and finally adopts the minimum distance computed in the sum step,
 provided that it is lower than its current value. If a vertex’s value does not change during
@@ -2285,7 +2474,7 @@ plays that each song has. We then filter out the list of songs the users do not
 playlist. Then we compute the top songs per user (i.e. the songs a user listened to the most).
 Finally, as a separate use-case on the same data set, we create a user-user similarity graph based
 on the common songs and use this resulting graph to detect communities by calling Gelly’s Label Propagation
-library method.&lt;/p&gt;
+library method. &lt;/p&gt;
 
 &lt;p&gt;For running the example implementation, please use the 0.10-SNAPSHOT version of Flink as a
 dependency. The full example code base can be found &lt;a href=&quot;https://github.com/apache/flink/blob/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/MusicProfiles.java&quot;&gt;here&lt;/a&gt;. The public data set used for testing
@@ -2375,10 +2564,10 @@ in the figure below.&lt;/p&gt;
 
 &lt;p&gt;To form the user-user graph in Flink, we will simply take the edges from the user-song graph
 (left-hand side of the image), group them by song-id, and then add all the users (source vertex ids)
-to an ArrayList.&lt;/p&gt;
+to an ArrayList. &lt;/p&gt;
 
 &lt;p&gt;We then match users who listened to the same song two by two, creating a new edge to mark their
-common interest (right-hand side of the image).&lt;/p&gt;
+common interest (right-hand side of the image). &lt;/p&gt;
 
 &lt;p&gt;Afterwards, we perform a &lt;code&gt;distinct()&lt;/code&gt; operation to avoid creation of duplicate data.
 Considering that we now have the DataSet of edges which present interest, creating a graph is as
@@ -2417,7 +2606,7 @@ formed. To do so, we first initialize each vertex with a numeric label using the
 the id of a vertex with the first element of the tuple, afterwards applying a map function.
 Finally, we call the &lt;code&gt;run()&lt;/code&gt; method with the LabelPropagation library method passed
 as a parameter. In the end, the vertices will be updated to contain the most frequent label
-among their neighbors.&lt;/p&gt;
+among their neighbors. &lt;/p&gt;
 
 &lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// detect user communities using label propagation&lt;/span&gt;
 &lt;span class=&quot;c1&quot;&gt;// initialize each vertex with a unique numeric label&lt;/span&gt;
@@ -2447,10 +2636,10 @@ among their neighbors.&lt;/p&gt;
 &lt;p&gt;Currently, Gelly matches the basic functionalities provided by most state-of-the-art graph
 processing systems. Our vision is to turn Gelly into more than “yet another library for running
 PageRank-like algorithms” by supporting generic iterations, implementing graph partitioning,
-providing bipartite graph support and by offering numerous other features.&lt;/p&gt;
+providing bipartite graph support and by offering numerous other features. &lt;/p&gt;
 
 &lt;p&gt;We are also enriching Flink Gelly with a set of operators suitable for highly skewed graphs
-as well as a Graph API built on Flink Streaming.&lt;/p&gt;
+as well as a Graph API built on Flink Streaming. &lt;/p&gt;
 
 &lt;p&gt;In the near future, we would like to see how Gelly can be integrated with graph visualization
 tools, graph database systems and sampling techniques.&lt;/p&gt;
@@ -2708,7 +2897,7 @@ tools, graph database systems and sampling techniques.&lt;/p&gt;
 
 <item>
 <title>April 2015 in the Flink community</title>
-<description>&lt;p&gt;April was an packed month for Apache Flink.&lt;/p&gt;
+<description>&lt;p&gt;April was an packed month for Apache Flink. &lt;/p&gt;
 
 &lt;h2 id=&quot;flink-090-milestone1-release&quot;&gt;Flink 0.9.0-milestone1 release&lt;/h2&gt;
 
@@ -2724,7 +2913,7 @@ tools, graph database systems and sampling techniques.&lt;/p&gt;
 
 &lt;h2 id=&quot;flink-on-the-web&quot;&gt;Flink on the web&lt;/h2&gt;
 
-&lt;p&gt;Fabian Hueske gave an &lt;a href=&quot;http://www.infoq.com/news/2015/04/hueske-apache-flink?utm_campaign=infoq_content&amp;amp;utm_source=infoq&amp;amp;utm_medium=feed&amp;amp;utm_term=global&quot;&gt;interview at InfoQ&lt;/a&gt; on Apache Flink.&lt;/p&gt;
+&lt;p&gt;Fabian Hueske gave an &lt;a href=&quot;http://www.infoq.com/news/2015/04/hueske-apache-flink?utm_campaign=infoq_content&amp;amp;utm_source=infoq&amp;amp;utm_medium=feed&amp;amp;utm_term=global&quot;&gt;interview at InfoQ&lt;/a&gt; on Apache Flink. &lt;/p&gt;
 
 &lt;h2 id=&quot;upcoming-events&quot;&gt;Upcoming events&lt;/h2&gt;
 
@@ -2756,7 +2945,7 @@ However, this approach has a few notable drawbacks. First of all it is not trivi
 &lt;img src=&quot;/img/blog/memory-mgmt.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
 &lt;/center&gt;
 
-&lt;p&gt;Flink’s style of active memory management and operating on binary data has several benefits:&lt;/p&gt;
+&lt;p&gt;Flink’s style of active memory management and operating on binary data has several benefits: &lt;/p&gt;
 
 &lt;ol&gt;
   &lt;li&gt;&lt;strong&gt;Memory-safe execution &amp;amp; efficient out-of-core algorithms.&lt;/strong&gt; Due to the fixed amount of allocated memory segments, it is trivial to monitor remaining memory resources. In case of memory shortage, processing operators can efficiently write larger batches of memory segments to disk and later them read back. Consequently, &lt;code&gt;OutOfMemoryErrors&lt;/code&gt; are effectively prevented.&lt;/li&gt;
@@ -2765,13 +2954,13 @@ However, this approach has a few notable drawbacks. First of all it is not trivi
   &lt;li&gt;&lt;strong&gt;Efficient binary operations &amp;amp; cache sensitivity.&lt;/strong&gt; Binary data can be efficiently compared and operated on given a suitable binary representation. Furthermore, the binary representations can put related values, as well as hash codes, keys, and pointers, adjacently into memory. This gives data structures with usually more cache efficient access patterns.&lt;/li&gt;
 &lt;/ol&gt;
 
-&lt;p&gt;These properties of active memory management are very desirable in a data processing systems for large-scale data analytics but have a significant price tag attached. Active memory management and operating on binary data is not trivial to implement, i.e., using &lt;code&gt;java.util.HashMap&lt;/code&gt; is much easier than implementing a spillable hash-table backed by byte arrays and a custom serialization stack. Of course Apache Flink is not the only JVM-based data processing system that operates on serialized binary data. Projects such as &lt;a href=&quot;http://drill.apache.org/&quot;&gt;Apache Drill&lt;/a&gt;, &lt;a href=&quot;http://ignite.incubator.apache.org/&quot;&gt;Apache Ignite (incubating)&lt;/a&gt; or &lt;a href=&quot;http://projectgeode.org/&quot;&gt;Apache Geode (incubating)&lt;/a&gt; apply similar techniques and it was recently announced that also &lt;a href=&quot;http://spark.apache.org/&quot;&gt;Apache Spark&lt;/a&gt; will evolve into this direction with &
 lt;a href=&quot;https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html&quot;&gt;Project Tungsten&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;These properties of active memory management are very desirable in a data processing systems for large-scale data analytics but have a significant price tag attached. Active memory management and operating on binary data is not trivial to implement, i.e., using &lt;code&gt;java.util.HashMap&lt;/code&gt; is much easier than implementing a spillable hash-table backed by byte arrays and a custom serialization stack. Of course Apache Flink is not the only JVM-based data processing system that operates on serialized binary data. Projects such as &lt;a href=&quot;http://drill.apache.org/&quot;&gt;Apache Drill&lt;/a&gt;, &lt;a href=&quot;http://ignite.incubator.apache.org/&quot;&gt;Apache Ignite (incubating)&lt;/a&gt; or &lt;a href=&quot;http://projectgeode.org/&quot;&gt;Apache Geode (incubating)&lt;/a&gt; apply similar techniques and it was recently announced that also &lt;a href=&quot;http://spark.apache.org/&quot;&gt;Apache Spark&lt;/a&gt; will evolve into this direction with &
 lt;a href=&quot;https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html&quot;&gt;Project Tungsten&lt;/a&gt;. &lt;/p&gt;
 
 &lt;p&gt;In the following we discuss in detail how Flink allocates memory, de/serializes objects, and operates on binary data. We will also show some performance numbers comparing processing objects on the heap and operating on binary data.&lt;/p&gt;
 
 &lt;h2 id=&quot;how-does-flink-allocate-memory&quot;&gt;How does Flink allocate memory?&lt;/h2&gt;
 
-&lt;p&gt;A Flink worker, called TaskManager, is composed of several internal components such as an actor system for coordination with the Flink master, an IOManager that takes care of spilling data to disk and reading it back, and a MemoryManager that coordinates memory usage. In the context of this blog post, the MemoryManager is of most interest.&lt;/p&gt;
+&lt;p&gt;A Flink worker, called TaskManager, is composed of several internal components such as an actor system for coordination with the Flink master, an IOManager that takes care of spilling data to disk and reading it back, and a MemoryManager that coordinates memory usage. In the context of this blog post, the MemoryManager is of most interest. &lt;/p&gt;
 
 &lt;p&gt;The MemoryManager takes care of allocating, accounting, and distributing MemorySegments to data processing operators such as sort and join operators. A &lt;a href=&quot;https://github.com/apache/flink/blob/release-0.9.0-milestone-1/flink-core/src/main/java/org/apache/flink/core/memory/MemorySegment.java&quot;&gt;MemorySegment&lt;/a&gt; is Flink’s distribution unit of memory and is backed by a regular Java byte array (size is 32 KB by default). A MemorySegment provides very efficient write and read access to its backed byte array using Java’s unsafe methods. You can think of a MemorySegment as a custom-tailored version of Java’s NIO ByteBuffer. In order to operate on multiple MemorySegments like on a larger chunk of consecutive memory, Flink uses logical views that implement Java’s &lt;code&gt;java.io.DataOutput&lt;/code&gt; and &lt;code&gt;java.io.DataInput&lt;/code&gt; interfaces.&lt;/p&gt;
 
@@ -2783,7 +2972,7 @@ However, this approach has a few notable drawbacks. First of all it is not trivi
 
 &lt;h2 id=&quot;how-does-flink-serialize-objects&quot;&gt;How does Flink serialize objects?&lt;/h2&gt;
 
-&lt;p&gt;The Java ecosystem offers several libraries to convert objects into a binary representation and back. Common alternatives are standard Java serialization, &lt;a href=&quot;https://github.com/EsotericSoftware/kryo&quot;&gt;Kryo&lt;/a&gt;, &lt;a href=&quot;http://avro.apache.org/&quot;&gt;Apache Avro&lt;/a&gt;, &lt;a href=&quot;http://thrift.apache.org/&quot;&gt;Apache Thrift&lt;/a&gt;, or Google’s &lt;a href=&quot;https://github.com/google/protobuf&quot;&gt;Protobuf&lt;/a&gt;. Flink includes its own custom serialization framework in order to control the binary representation of data. This is important because operating on binary data such as comparing or even manipulating binary data requires exact knowledge of the serialization layout. Further, configuring the serialization layout with respect to operations that are performed on binary data can yield a significant performance boost. Flink’s serialization stack also leverages the fact, that the type of the objects which 
 are going through de/serialization are exactly known before a program is executed.&lt;/p&gt;
+&lt;p&gt;The Java ecosystem offers several libraries to convert objects into a binary representation and back. Common alternatives are standard Java serialization, &lt;a href=&quot;https://github.com/EsotericSoftware/kryo&quot;&gt;Kryo&lt;/a&gt;, &lt;a href=&quot;http://avro.apache.org/&quot;&gt;Apache Avro&lt;/a&gt;, &lt;a href=&quot;http://thrift.apache.org/&quot;&gt;Apache Thrift&lt;/a&gt;, or Google’s &lt;a href=&quot;https://github.com/google/protobuf&quot;&gt;Protobuf&lt;/a&gt;. Flink includes its own custom serialization framework in order to control the binary representation of data. This is important because operating on binary data such as comparing or even manipulating binary data requires exact knowledge of the serialization layout. Further, configuring the serialization layout with respect to operations that are performed on binary data can yield a significant performance boost. Flink’s serialization stack also leverages the fact, that the type of the objects which 
 are going through de/serialization are exactly known before a program is executed. &lt;/p&gt;
 
 &lt;p&gt;Flink programs can process data represented as arbitrary Java or Scala objects. Before a program is optimized, the data types at each processing step of the program’s data flow need to be identified. For Java programs, Flink features a reflection-based type extraction component to analyze the return types of user-defined functions. Scala programs are analyzed with help of the Scala compiler. Flink represents each data type with a &lt;a href=&quot;https://github.com/apache/flink/blob/release-0.9.0-milestone-1/flink-core/src/main/java/org/apache/flink/api/common/typeinfo/TypeInformation.java&quot;&gt;TypeInformation&lt;/a&gt;. Flink has TypeInformations for several kinds of data types, including:&lt;/p&gt;
 
@@ -2793,11 +2982,11 @@ However, this approach has a few notable drawbacks. First of all it is not trivi
   &lt;li&gt;WritableTypeInfo: Any implementation of Hadoop’s Writable interface.&lt;/li&gt;
   &lt;li&gt;TupleTypeInfo: Any Flink tuple (Tuple1 to Tuple25). Flink tuples are Java representations for fixed-length tuples with typed fields.&lt;/li&gt;
   &lt;li&gt;CaseClassTypeInfo: Any Scala CaseClass (including Scala tuples).&lt;/li&gt;
-  &lt;li&gt;PojoTypeInfo: Any POJO (Java or Scala), i.e., an object with all fields either being public or accessible through getters and setter that follow the common naming conventions.&lt;/li&gt;
+  &lt;li&gt;PojoTypeInfo: Any POJO (Java or Scala), i.e., an object with all fields either being public or accessible through getters and setter that follow the common naming conventions. &lt;/li&gt;
   &lt;li&gt;GenericTypeInfo: Any data type that cannot be identified as another type.&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;Each TypeInformation provides a serializer for the data type it represents. For example, a BasicTypeInfo returns a serializer that writes the respective primitive type, the serializer of a WritableTypeInfo delegates de/serialization to the write() and readFields() methods of the object implementing Hadoop’s Writable interface, and a GenericTypeInfo returns a serializer that delegates serialization to Kryo. Object serialization to a DataOutput which is backed by Flink MemorySegments goes automatically through Java’s efficient unsafe operations. For data types that can be used as keys, i.e., compared and hashed, the TypeInformation provides TypeComparators. TypeComparators compare and hash objects and can - depending on the concrete data type - also efficiently compare binary representations and extract fixed-length binary key prefixes.&lt;/p&gt;
+&lt;p&gt;Each TypeInformation provides a serializer for the data type it represents. For example, a BasicTypeInfo returns a serializer that writes the respective primitive type, the serializer of a WritableTypeInfo delegates de/serialization to the write() and readFields() methods of the object implementing Hadoop’s Writable interface, and a GenericTypeInfo returns a serializer that delegates serialization to Kryo. Object serialization to a DataOutput which is backed by Flink MemorySegments goes automatically through Java’s efficient unsafe operations. For data types that can be used as keys, i.e., compared and hashed, the TypeInformation provides TypeComparators. TypeComparators compare and hash objects and can - depending on the concrete data type - also efficiently compare binary representations and extract fixed-length binary key prefixes. &lt;/p&gt;
 
 &lt;p&gt;Tuple, Pojo, and CaseClass types are composite types, i.e., containers for one or more possibly nested data types. As such, their serializers and comparators are also composite and delegate the serialization and comparison of their member data types to the respective serializers and comparators. The following figure illustrates the serialization of a (nested) &lt;code&gt;Tuple3&amp;lt;Integer, Double, Person&amp;gt;&lt;/code&gt; object where &lt;code&gt;Person&lt;/code&gt; is a POJO and defined as follows:&lt;/p&gt;
 
@@ -2810,13 +2999,13 @@ However, this approach has a few notable drawbacks. First of all it is not trivi
 &lt;img src=&quot;/img/blog/data-serialization.png&quot; style=&quot;width:80%;margin:15px&quot; /&gt;
 &lt;/center&gt;
 
-&lt;p&gt;Flink’s type system can be easily extended by providing custom TypeInformations, Serializers, and Comparators to improve the performance of serializing and comparing custom data types.&lt;/p&gt;
+&lt;p&gt;Flink’s type system can be easily extended by providing custom TypeInformations, Serializers, and Comparators to improve the performance of serializing and comparing custom data types. &lt;/p&gt;
 
 &lt;h2 id=&quot;how-does-flink-operate-on-binary-data&quot;&gt;How does Flink operate on binary data?&lt;/h2&gt;
 
 &lt;p&gt;Similar to many other data processing APIs (including SQL), Flink’s APIs provide transformations to group, sort, and join data sets. These transformations operate on potentially very large data sets. Relational database systems feature very efficient algorithms for these purposes since several decades including external merge-sort, merge-join, and hybrid hash-join. Flink builds on this technology, but generalizes it to handle arbitrary objects using its custom serialization and comparison stack. In the following, we show how Flink operates with binary data by the example of Flink’s in-memory sort algorithm.&lt;/p&gt;
 
-&lt;p&gt;Flink assigns a memory budget to its data processing operators. Upon initialization, a sort algorithm requests its memory budget from the MemoryManager and receives a corresponding set of MemorySegments. The set of MemorySegments becomes the memory pool of a so-called sort buffer which collects the data that is be sorted. The following figure illustrates how data objects are serialized into the sort buffer.&lt;/p&gt;
+&lt;p&gt;Flink assigns a memory budget to its data processing operators. Upon initialization, a sort algorithm requests its memory budget from the MemoryManager and receives a corresponding set of MemorySegments. The set of MemorySegments becomes the memory pool of a so-called sort buffer which collects the data that is be sorted. The following figure illustrates how data objects are serialized into the sort buffer. &lt;/p&gt;
 
 &lt;center&gt;
 &lt;img src=&quot;/img/blog/sorting-binary-data-1.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
@@ -2829,7 +3018,7 @@ The following figure shows how two objects are compared.&lt;/p&gt;
 &lt;img src=&quot;/img/blog/sorting-binary-data-2.png&quot; style=&quot;width:80%;margin:15px&quot; /&gt;
 &lt;/center&gt;
 
-&lt;p&gt;The sort buffer compares two elements by comparing their binary fix-length sort keys. The comparison is successful if either done on a full key (not a prefix key) or if the binary prefix keys are not equal. If the prefix keys are equal (or the sort key data type does not provide a binary prefix key), the sort buffer follows the pointers to the actual object data, deserializes both objects and compares the objects. Depending on the result of the comparison, the sort algorithm decides whether to swap the compared elements or not. The sort buffer swaps two elements by moving their fix-length keys and pointers. The actual data is not moved. Once the sort algorithm finishes, the pointers in the sort buffer are correctly ordered. The following figure shows how the sorted data is returned from the sort buffer.&lt;/p&gt;
+&lt;p&gt;The sort buffer compares two elements by comparing their binary fix-length sort keys. The comparison is successful if either done on a full key (not a prefix key) or if the binary prefix keys are not equal. If the prefix keys are equal (or the sort key data type does not provide a binary prefix key), the sort buffer follows the pointers to the actual object data, deserializes both objects and compares the objects. Depending on the result of the comparison, the sort algorithm decides whether to swap the compared elements or not. The sort buffer swaps two elements by moving their fix-length keys and pointers. The actual data is not moved. Once the sort algorithm finishes, the pointers in the sort buffer are correctly ordered. The following figure shows how the sorted data is returned from the sort buffer. &lt;/p&gt;
 
 &lt;center&gt;
 &lt;img src=&quot;/img/blog/sorting-binary-data-3.png&quot; style=&quot;width:80%;margin:15px&quot; /&gt;
@@ -2847,7 +3036,7 @@ The following figure shows how two objects are compared.&lt;/p&gt;
   &lt;li&gt;&lt;strong&gt;Kryo-serialized.&lt;/strong&gt; The tuple fields are serialized into a sort buffer of 600 MB size using Kryo serialization and sorted without binary sort keys. This means that each pair-wise comparison requires two object to be deserialized.&lt;/li&gt;
 &lt;/ol&gt;
 
-&lt;p&gt;All sort methods are implemented using a single thread. The reported times are averaged over ten runs. After each run, we call &lt;code&gt;System.gc()&lt;/code&gt; to request a garbage collection run which does not go into measured execution time. The following figure shows the time to store the input data in memory, sort it, and read it back as objects.&lt;/p&gt;
+&lt;p&gt;All sort methods are implemented using a single thread. The reported times are averaged over ten runs. After each run, we call &lt;code&gt;System.gc()&lt;/code&gt; to request a garbage collection run which does not go into measured execution time. The following figure shows the time to store the input data in memory, sort it, and read it back as objects. &lt;/p&gt;
 
 &lt;center&gt;
 &lt;img src=&quot;/img/blog/sort-benchmark.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
@@ -2905,13 +3094,13 @@ The following figure shows how two objects are compared.&lt;/p&gt;
 
 &lt;p&gt;&lt;br /&gt;&lt;/p&gt;
 
-&lt;p&gt;To summarize, the experiments verify the previously stated benefits of operating on binary data.&lt;/p&gt;
+&lt;p&gt;To summarize, the experiments verify the previously stated benefits of operating on binary data. &lt;/p&gt;
 
 &lt;h2 id=&quot;were-not-done-yet&quot;&gt;We’re not done yet!&lt;/h2&gt;
 
-&lt;p&gt;Apache Flink features quite a bit of advanced techniques to safely and efficiently process huge amounts of data with limited memory resources. However, there are a few points that could make Flink even more efficient. The Flink community is working on moving the managed memory to off-heap memory. This will allow for smaller JVMs, lower garbage collection overhead, and also easier system configuration. With Flink’s Table API, the semantics of all operations such as aggregations and projections are known (in contrast to black-box user-defined functions). Hence we can generate code for Table API operations that directly operates on binary data. Further improvements include serialization layouts which are tailored towards the operations that are applied on the binary data and code generation for serializers and comparators.&lt;/p&gt;
+&lt;p&gt;Apache Flink features quite a bit of advanced techniques to safely and efficiently process huge amounts of data with limited memory resources. However, there are a few points that could make Flink even more efficient. The Flink community is working on moving the managed memory to off-heap memory. This will allow for smaller JVMs, lower garbage collection overhead, and also easier system configuration. With Flink’s Table API, the semantics of all operations such as aggregations and projections are known (in contrast to black-box user-defined functions). Hence we can generate code for Table API operations that directly operates on binary data. Further improvements include serialization layouts which are tailored towards the operations that are applied on the binary data and code generation for serializers and comparators. &lt;/p&gt;
 
-&lt;p&gt;The groundwork (and a lot more) for operating on binary data is done but there is still some room for making Flink even better and faster. If you are crazy about performance and like to juggle with lot of bits and bytes, join the Flink community!&lt;/p&gt;
+&lt;p&gt;The groundwork (and a lot more) for operating on binary data is done but there is still some room for making Flink even better and faster. If you are crazy about performance and like to juggle with lot of bits and bytes, join the Flink community! &lt;/p&gt;
 
 &lt;h2 id=&quot;tldr-give-me-three-things-to-remember&quot;&gt;TL;DR; Give me three things to remember!&lt;/h2&gt;
 
@@ -3265,7 +3454,7 @@ Tez as an execution backend instead of Flink’s own network stack. Learn more
 &lt;p&gt;In this blog post, we cut through Apache Flink’s layered architecture and take a look at its internals with a focus on how it handles joins. Specifically, I will&lt;/p&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;show how easy it is to join data sets using Flink’s fluent APIs,&lt;/li&gt;
+  &lt;li&gt;show how easy it is to join data sets using Flink’s fluent APIs, &lt;/li&gt;
   &lt;li&gt;discuss basic distributed join strategies, Flink’s join implementations, and its memory management,&lt;/li&gt;
   &lt;li&gt;talk about Flink’s optimizer that automatically chooses join strategies,&lt;/li&gt;
   &lt;li&gt;show some performance numbers for joining data sets of different sizes, and finally&lt;/li&gt;
@@ -3276,7 +3465,7 @@ Tez as an execution backend instead of Flink’s own network stack. Learn more
 
 &lt;h3 id=&quot;how-do-i-join-with-flink&quot;&gt;How do I join with Flink?&lt;/h3&gt;
 
-&lt;p&gt;Flink provides fluent APIs in Java and Scala to write data flow programs. Flink’s APIs are centered around parallel data collections which are called data sets. data sets are processed by applying Transformations that compute new data sets. Flink’s transformations include Map and Reduce as known from MapReduce &lt;a href=&quot;http://research.google.com/archive/mapreduce.html&quot;&gt;[1]&lt;/a&gt; but also operators for joining, co-grouping, and iterative processing. The documentation gives an overview of all available transformations &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html&quot;&gt;[2]&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;Flink provides fluent APIs in Java and Scala to write data flow programs. Flink’s APIs are centered around parallel data collections which are called data sets. data sets are processed by applying Transformations that compute new data sets. Flink’s transformations include Map and Reduce as known from MapReduce &lt;a href=&quot;http://research.google.com/archive/mapreduce.html&quot;&gt;[1]&lt;/a&gt; but also operators for joining, co-grouping, and iterative processing. The documentation gives an overview of all available transformations &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html&quot;&gt;[2]&lt;/a&gt;. &lt;/p&gt;
 
 &lt;p&gt;Joining two Scala case class data sets is very easy as the following example shows:&lt;/p&gt;
 
@@ -3313,7 +3502,7 @@ Tez as an execution backend instead of Flink’s own network stack. Learn more
 
 &lt;ol&gt;
   &lt;li&gt;The data of both inputs is distributed across all parallel instances that participate in the join and&lt;/li&gt;
-  &lt;li&gt;each parallel instance performs a standard stand-alone join algorithm on its local partition of the overall data.&lt;/li&gt;
+  &lt;li&gt;each parallel instance performs a standard stand-alone join algorithm on its local partition of the overall data. &lt;/li&gt;
 &lt;/ol&gt;
 
 &lt;p&gt;The distribution of data across parallel instances must ensure that each valid join pair can be locally built by exactly one instance. For both steps, there are multiple valid strategies that can be independently picked and which are favorable in different situations. In Flink terminology, the first phase is called Ship Strategy and the second phase Local Strategy. In the following I will describe Flink’s ship and local strategies to join two data sets &lt;em&gt;R&lt;/em&gt; and &lt;em&gt;S&lt;/em&gt;.&lt;/p&gt;
@@ -3332,7 +3521,7 @@ Tez as an execution backend instead of Flink’s own network stack. Learn more
 &lt;img src=&quot;/img/blog/joins-repartition.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
 &lt;/center&gt;
 
-&lt;p&gt;The Broadcast-Forward strategy sends one complete data set (R) to each parallel instance that holds a partition of the other data set (S), i.e., each parallel instance receives the full data set R. Data set S remains local and is not shipped at all. The cost of the BF strategy depends on the size of R and the number of parallel instances it is shipped to. The size of S does not matter because S is not moved. The figure below illustrates how both ship strategies work.&lt;/p&gt;
+&lt;p&gt;The Broadcast-Forward strategy sends one complete data set (R) to each parallel instance that holds a partition of the other data set (S), i.e., each parallel instance receives the full data set R. Data set S remains local and is not shipped at all. The cost of the BF strategy depends on the size of R and the number of parallel instances it is shipped to. The size of S does not matter because S is not moved. The figure below illustrates how both ship strategies work. &lt;/p&gt;
 
 &lt;center&gt;
 &lt;img src=&quot;/img/blog/joins-broadcast.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
@@ -3341,7 +3530,7 @@ Tez as an execution backend instead of Flink’s own network stack. Learn more
 &lt;p&gt;The Repartition-Repartition and Broadcast-Forward ship strategies establish suitable data distributions to execute a distributed join. Depending on the operations that are applied before the join, one or even both inputs of a join are already distributed in a suitable way across parallel instances. In this case, Flink will reuse such distributions and only ship one or no input at all.&lt;/p&gt;
 
 &lt;h4 id=&quot;flinks-memory-management&quot;&gt;Flink’s Memory Management&lt;/h4&gt;
-&lt;p&gt;Before delving into the details of Flink’s local join algorithms, I will briefly discuss Flink’s internal memory management. Data processing algorithms such as joining, grouping, and sorting need to hold portions of their input data in memory. While such algorithms perform best if there is enough memory available to hold all data, it is crucial to gracefully handle situations where the data size exceeds memory. Such situations are especially tricky in JVM-based systems such as Flink because the system needs to reliably recognize that it is short on memory. Failure to detect such situations can result in an &lt;code&gt;OutOfMemoryException&lt;/code&gt; and kill the JVM.&lt;/p&gt;
+&lt;p&gt;Before delving into the details of Flink’s local join algorithms, I will briefly discuss Flink’s internal memory management. Data processing algorithms such as joining, grouping, and sorting need to hold portions of their input data in memory. While such algorithms perform best if there is enough memory available to hold all data, it is crucial to gracefully handle situations where the data size exceeds memory. Such situations are especially tricky in JVM-based systems such as Flink because the system needs to reliably recognize that it is short on memory. Failure to detect such situations can result in an &lt;code&gt;OutOfMemoryException&lt;/code&gt; and kill the JVM. &lt;/p&gt;
 
 &lt;p&gt;Flink handles this challenge by actively managing its memory. When a worker node (TaskManager) is started, it allocates a fixed portion (70% by default) of the JVM’s heap memory that is available after initialization as 32KB byte arrays. These byte arrays are distributed as working memory to all algorithms that need to hold significant portions of data in memory. The algorithms receive their input data as Java data objects and serialize them into their working memory.&lt;/p&gt;
 
@@ -3358,7 +3547,7 @@ Tez as an execution backend instead of Flink’s own network stack. Learn more
 &lt;p&gt;After the data has been distributed across all parallel join instances using either a Repartition-Repartition or Broadcast-Forward ship strategy, each instance runs a local join algorithm to join the elements of its local partition. Flink’s runtime features two common join strategies to perform these local joins:&lt;/p&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;the &lt;em&gt;Sort-Merge-Join&lt;/em&gt; strategy (SM) and&lt;/li&gt;
+  &lt;li&gt;the &lt;em&gt;Sort-Merge-Join&lt;/em&gt; strategy (SM) and &lt;/li&gt;
   &lt;li&gt;the &lt;em&gt;Hybrid-Hash-Join&lt;/em&gt; strategy (HH).&lt;/li&gt;
 &lt;/ul&gt;
 
@@ -3403,13 +3592,13 @@ Tez as an execution backend instead of Flink’s own network stack. Learn more
 &lt;ul&gt;
   &lt;li&gt;1GB     : 1000GB&lt;/li&gt;
   &lt;li&gt;10GB    : 1000GB&lt;/li&gt;
-  &lt;li&gt;100GB   : 1000GB&lt;/li&gt;
+  &lt;li&gt;100GB   : 1000GB &lt;/li&gt;
   &lt;li&gt;1000GB  : 1000GB&lt;/li&gt;
 &lt;/ul&gt;
 
 &lt;p&gt;The Broadcast-Forward strategy is only executed for up to 10GB. Building a hash table from 100GB broadcasted data in 5GB working memory would result in spilling proximately 95GB (build input) + 950GB (probe input) in each parallel thread and require more than 8TB local disk storage on each machine.&lt;/p&gt;
 
-&lt;p&gt;As in the single-core benchmark, we run 1:N joins, generate the data on-the-fly, and immediately discard the result after the join. We run the benchmark on 10 n1-highmem-8 Google Compute Engine instances. Each instance is equipped with 8 cores, 52GB RAM, 40GB of which are configured as working memory (5GB per core), and one local SSD for spilling to disk. All benchmarks are performed using the same configuration, i.e., no fine tuning for the respective data sizes is done. The programs are executed with a parallelism of 80.&lt;/p&gt;
+&lt;p&gt;As in the single-core benchmark, we run 1:N joins, generate the data on-the-fly, and immediately discard the result after the join. We run the benchmark on 10 n1-highmem-8 Google Compute Engine instances. Each instance is equipped with 8 cores, 52GB RAM, 40GB of which are configured as working memory (5GB per core), and one local SSD for spilling to disk. All benchmarks are performed using the same configuration, i.e., no fine tuning for the respective data sizes is done. The programs are executed with a parallelism of 80. &lt;/p&gt;
 
 &lt;center&gt;
 &lt;img src=&quot;/img/blog/joins-dist-perf.png&quot; style=&quot;width:70%;margin:15px&quot; /&gt;
@@ -3426,7 +3615,7 @@ Tez as an execution backend instead of Flink’s own network stack. Learn more
 &lt;ul&gt;
   &lt;li&gt;Flink’s fluent Scala and Java APIs make joins and other data transformations easy as cake.&lt;/li&gt;
   &lt;li&gt;The optimizer does the hard choices for you, but gives you control in case you know better.&lt;/li&gt;
-  &lt;li&gt;Flink’s join implementations perform very good in-memory and gracefully degrade when going to disk.&lt;/li&gt;
+  &lt;li&gt;Flink’s join implementations perform very good in-memory and gracefully degrade when going to disk. &lt;/li&gt;
   &lt;li&gt;Due to Flink’s robust memory management, there is no need for job- or data-specific memory tuning to avoid a nasty &lt;code&gt;OutOfMemoryException&lt;/code&gt;. It just runs out-of-the-box.&lt;/li&gt;
 &lt;/ul&gt;
 
@@ -3597,7 +3786,7 @@ found &lt;a href=&quot;https://github.com/mbalassi/flink/blob/stockprices/flink-
   &lt;li&gt;Read a socket stream of stock prices&lt;/li&gt;
   &lt;li&gt;Parse the text in the stream to create a stream of &lt;code&gt;StockPrice&lt;/code&gt; objects&lt;/li&gt;
   &lt;li&gt;Add four other sources tagged with the stock symbol.&lt;/li&gt;
-  &lt;li&gt;Finally, merge the streams to create a unified stream.&lt;/li&gt;
+  &lt;li&gt;Finally, merge the streams to create a unified stream. &lt;/li&gt;
 &lt;/ol&gt;
 
 &lt;p&gt;&lt;img alt=&quot;Reading from multiple inputs&quot; src=&quot;/img/blog/blog_multi_input.png&quot; width=&quot;70%&quot; class=&quot;img-responsive center-block&quot; /&gt;&lt;/p&gt;
@@ -4069,7 +4258,7 @@ number of mentions of a given stock in the Twitter stream. As both of
 these data streams are potentially infinite, we apply the join on a
 30-second window.&lt;/p&gt;
 
-&lt;p&gt;&lt;img alt=&quot;Streaming joins&quot; src=&quot;/img/blog/blog_stream_join.png&quot; width=&quot;60%&quot; class=&quot;img-responsive center-block&quot; /&gt;&lt;/p&gt;
+&lt;p&gt;&lt;img alt=&quot;Streaming joins&quot; src=&quot;/img/blog/blog_stream_join.png&quot; width=&quot;60%&quot; class=&quot;img-responsive center-block&quot; /&gt; &lt;/p&gt;
 
 &lt;div class=&quot;codetabs&quot;&gt;
 
@@ -4238,7 +4427,7 @@ internally, fault tolerance, and performance measurements!&lt;/p&gt;
 
 &lt;h3 id=&quot;using-off-heap-memoryhttpsgithubcomapacheflinkpull290&quot;&gt;&lt;a href=&quot;https://github.com/apache/flink/pull/290&quot;&gt;Using off-heap memory&lt;/a&gt;&lt;/h3&gt;
 
-&lt;p&gt;This pull request enables Flink to use off-heap memory for its internal memory uses (sort, hash, caching of intermediate data sets).&lt;/p&gt;
+&lt;p&gt;This pull request enables Flink to use off-heap memory for its internal memory uses (sort, hash, caching of intermediate data sets). &lt;/p&gt;
 
 &lt;h3 id=&quot;gelly-flinks-graph-apihttpsgithubcomapacheflinkpull335&quot;&gt;&lt;a href=&quot;https://github.com/apache/flink/pull/335&quot;&gt;Gelly, Flink’s Graph API&lt;/a&gt;&lt;/h3&gt;
 
@@ -4310,7 +4499,7 @@ internally, fault tolerance, and performance measurements!&lt;/p&gt;
   &lt;li&gt;Stefan Bunk&lt;/li&gt;
   &lt;li&gt;Paris Carbone&lt;/li&gt;
   &lt;li&gt;Ufuk Celebi&lt;/li&gt;
-  &lt;li&gt;Nils Engelbach&lt;/li&gt;
+  &lt;li&gt;Nils Engelbach &lt;/li&gt;
   &lt;li&gt;Stephan Ewen&lt;/li&gt;
   &lt;li&gt;Gyula Fora&lt;/li&gt;
   &lt;li&gt;Gabor Hermann&lt;/li&gt;
@@ -4415,7 +4604,7 @@ Flink serialization system improved a lot over time and by now surpasses the cap
 &lt;img src=&quot;/img/blog/hcompat-logos.png&quot; style=&quot;width:30%;margin:15px&quot; /&gt;
 &lt;/center&gt;
 
-&lt;p&gt;To close this gap, Flink provides a Hadoop Compatibility package to wrap functions implemented against Hadoop’s MapReduce interfaces and embed them in Flink programs. This package was developed as part of a &lt;a href=&quot;https://developers.google.com/open-source/soc/&quot;&gt;Google Summer of Code&lt;/a&gt; 2014 project.&lt;/p&gt;
+&lt;p&gt;To close this gap, Flink provides a Hadoop Compatibility package to wrap functions implemented against Hadoop’s MapReduce interfaces and embed them in Flink programs. This package was developed as part of a &lt;a href=&quot;https://developers.google.com/open-source/soc/&quot;&gt;Google Summer of Code&lt;/a&gt; 2014 project. &lt;/p&gt;
 
 &lt;p&gt;With the Hadoop Compatibility package, you can reuse all your Hadoop&lt;/p&gt;
 
@@ -4428,7 +4617,7 @@ Flink serialization system improved a lot over time and by now surpasses the cap
 
 &lt;p&gt;in Flink programs without changing a line of code. Moreover, Flink also natively supports all Hadoop data types (&lt;code&gt;Writables&lt;/code&gt; and &lt;code&gt;WritableComparable&lt;/code&gt;).&lt;/p&gt;
 
-&lt;p&gt;The following code snippet shows a simple Flink WordCount prog

<TRUNCATED>

Mime
View raw message