flink-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rmetz...@apache.org
Subject svn commit: r1667227 - in /flink: _posts/ site/blog/ site/blog/page2/ site/blog/page3/ site/img/blog/ site/news/2015/03/13/
Date Tue, 17 Mar 2015 10:05:20 GMT
Author: rmetzger
Date: Tue Mar 17 10:05:20 2015
New Revision: 1667227

URL: http://svn.apache.org/r1667227
Log:
Add join blog post to website

Added:
    flink/site/img/blog/joins-broadcast.png   (with props)
    flink/site/img/blog/joins-dist-perf.png   (with props)
    flink/site/img/blog/joins-hhj.png   (with props)
    flink/site/img/blog/joins-memmgmt.png   (with props)
    flink/site/img/blog/joins-repartition.png   (with props)
    flink/site/img/blog/joins-single-perf.png   (with props)
    flink/site/img/blog/joins-smj.png   (with props)
    flink/site/news/2015/03/13/
    flink/site/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html
Modified:
    flink/_posts/2015-03-13-peeking-into-Apache-Flinks-Engine-Room.md
    flink/site/blog/feed.xml
    flink/site/blog/index.html
    flink/site/blog/page2/index.html
    flink/site/blog/page3/index.html

Modified: flink/_posts/2015-03-13-peeking-into-Apache-Flinks-Engine-Room.md
URL: http://svn.apache.org/viewvc/flink/_posts/2015-03-13-peeking-into-Apache-Flinks-Engine-Room.md?rev=1667227&r1=1667226&r2=1667227&view=diff
==============================================================================
--- flink/_posts/2015-03-13-peeking-into-Apache-Flinks-Engine-Room.md (original)
+++ flink/_posts/2015-03-13-peeking-into-Apache-Flinks-Engine-Room.md Tue Mar 17 10:05:20 2015
@@ -1,12 +1,11 @@
 ---
 layout: post
-title:  'Peeking into Apache Flinks Engine Room'
+title:  "Peeking into Apache Flink's Engine Room"
 date:   2015-03-13 10:00:00
 categories: news
 ---
 
-##Peeking into Apache Flink's Engine Room
-####Join Processing in Apache Flink
+###Join Processing in Apache Flink
 
 Joins are prevalent operations in many data processing applications. Most data processing systems feature APIs that make joining data sets very easy. However, the internal algorithms for join processing are much more involved especially if large data sets need to be efficiently handled. Therefore, join processing serves as a good example to discuss the salient design points and implementation details of a data processing system.
 

Modified: flink/site/blog/feed.xml
URL: http://svn.apache.org/viewvc/flink/site/blog/feed.xml?rev=1667227&r1=1667226&r2=1667227&view=diff
==============================================================================
Binary files - no diff available.

Modified: flink/site/blog/index.html
URL: http://svn.apache.org/viewvc/flink/site/blog/index.html?rev=1667227&r1=1667226&r2=1667227&view=diff
==============================================================================
--- flink/site/blog/index.html (original)
+++ flink/site/blog/index.html Tue Mar 17 10:05:20 2015
@@ -134,6 +134,197 @@
 		<div class="col-md-8">
 			
 			<article>
+				<h2><a href="/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html">Peeking into Apache Flink's Engine Room</a></h2>
+				<p class="meta">13 Mar 2015</p>
+
+				<div><h3 id="join-processing-in-apache-flink">Join Processing in Apache Flink</h3>
+
+<p>Joins are prevalent operations in many data processing applications. Most data processing systems feature APIs that make joining data sets very easy. However, the internal algorithms for join processing are much more involved especially if large data sets need to be efficiently handled. Therefore, join processing serves as a good example to discuss the salient design points and implementation details of a data processing system.</p>
+
+<p>In this blog post, we cut through Apache Flink’s layered architecture and take a look at its internals with a focus on how it handles joins. Specifically, I will</p>
+
+<ul>
+<li>show how easy it is to join data sets using Flink’s fluent APIs, </li>
+<li>discuss basic distributed join strategies, Flink’s join implementations, and its memory management,</li>
+<li>talk about Flink’s optimizer that automatically chooses join strategies,</li>
+<li>show some performance numbers for joining data sets of different sizes, and finally</li>
+<li>briefly discuss joining of co-located and pre-sorted data sets.</li>
+</ul>
+
+<p><em>Disclaimer</em>: This blog post is exclusively about equi-joins. Whenever I say “join” in the following, I actually mean “equi-join”.</p>
+
+<h3 id="how-do-i-join-with-flink?">How do I join with Flink?</h3>
+
+<p>Flink provides fluent APIs in Java and Scala to write data flow programs. Flink’s APIs are centered around parallel data collections which are called data sets. data sets are processed by applying Transformations that compute new data sets. Flink’s transformations include Map and Reduce as known from MapReduce <a href="http://research.google.com/archive/mapreduce.html">[1]</a> but also operators for joining, co-grouping, and iterative processing. The documentation gives an overview of all available transformations <a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html">[2]</a>. </p>
+
+<p>Joining two Scala case class data sets is very easy as the following example shows:</p>
+<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// define your data types</span>
+<span class="k">case</span> <span class="k">class</span> <span class="nc">PageVisit</span><span class="o">(</span><span class="n">url</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">ip</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">userId</span><span class="k">:</span> <span class="kt">Long</span><span class="o">)</span>
+<span class="k">case</span> <span class="k">class</span> <span class="nc">User</span><span class="o">(</span><span class="n">id</span><span class="k">:</span> <span class="kt">Long</span><span class="o">,</span> <span class="n">name</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">email</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">country</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span>
+
+<span class="c1">// get your data from somewhere</span>
+<span class="k">val</span> <span class="n">visits</span><span class="k">:</span> <span class="kt">DataSet</span><span class="o">[</span><span class="kt">PageVisit</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
+<span class="k">val</span> <span class="n">users</span><span class="k">:</span> <span class="kt">DataSet</span><span class="o">[</span><span class="kt">User</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
+
+<span class="c1">// filter the users data set</span>
+<span class="k">val</span> <span class="n">germanUsers</span> <span class="k">=</span> <span class="n">users</span><span class="o">.</span><span class="n">filter</span><span class="o">((</span><span class="n">u</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">u</span><span class="o">.</span><span class="n">country</span><span class="o">.</span><span class="n">equals</span><span class="o">(</span><span class="s">&quot;de&quot;</span><span class="o">))</span>
+<span class="c1">// join data sets</span>
+<span class="k">val</span> <span class="n">germanVisits</span><span class="k">:</span> <span class="kt">DataSet</span><span class="o">[(</span><span class="kt">PageVisit</span>, <span class="kt">User</span><span class="o">)]</span> <span class="k">=</span>
+      <span class="c1">// equi-join condition (PageVisit.userId = User.id)</span>
+     <span class="n">visits</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">germanUsers</span><span class="o">).</span><span class="n">where</span><span class="o">(</span><span class="s">&quot;userId&quot;</span><span class="o">).</span><span class="n">equalTo</span><span class="o">(</span><span class="s">&quot;id&quot;</span><span class="o">)</span>
+</code></pre></div>
+<p>Flink’s APIs also allow to:</p>
+
+<ul>
+<li>apply a user-defined join function to each pair of joined elements instead returning a <code>($Left, $Right)</code> tuple,</li>
+<li>select fields of pairs of joined Tuple elements (projection), and</li>
+<li>define composite join keys such as <code>.where(“orderDate”, “zipCode”).equalTo(“date”, “zip”)</code>.</li>
+</ul>
+
+<p>See the documentation for more details on Flink’s join features <a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html#join">[3]</a>.</p>
+
+<h3 id="how-does-flink-join-my-data?">How does Flink join my data?</h3>
+
+<p>Flink uses techniques which are well known from parallel database systems to efficiently execute parallel joins. A join operator must establish all pairs of elements from its input data sets for which the join condition evaluates to true. In a standalone system, the most straight-forward implementation of a join is the so-called nested-loop join which builds the full Cartesian product and evaluates the join condition for each pair of elements. This strategy has quadratic complexity and does obviously not scale to large inputs.</p>
+
+<p>In a distributed system joins are commonly processed in two steps:</p>
+
+<ol>
+<li>The data of both inputs is distributed across all parallel instances that participate in the join and</li>
+<li>each parallel instance performs a standard stand-alone join algorithm on its local partition of the overall data. </li>
+</ol>
+
+<p>The distribution of data across parallel instances must ensure that each valid join pair can be locally built by exactly one instance. For both steps, there are multiple valid strategies that can be independently picked and which are favorable in different situations. In Flink terminology, the first phase is called Ship Strategy and the second phase Local Strategy. In the following I will describe Flink’s ship and local strategies to join two data sets <em>R</em> and <em>S</em>.</p>
+
+<h4 id="ship-strategies">Ship Strategies</h4>
+
+<p>Flink features two ship strategies to establish a valid data partitioning for a join:</p>
+
+<ul>
+<li>the <em>Repartition-Repartition</em> strategy (RR) and</li>
+<li>the <em>Broadcast-Forward</em> strategy (BF).</li>
+</ul>
+
+<p>The Repartition-Repartition strategy partitions both inputs, R and S, on their join key attributes using the same partitioning function. Each partition is assigned to exactly one parallel join instance and all data of that partition is sent to its associated instance. This ensures that all elements that share the same join key are shipped to the same parallel instance and can be locally joined. The cost of the RR strategy is a full shuffle of both data sets over the network.</p>
+
+<p><center>
+<img src="/img/blog/joins-broadcast.png" style="width:90%;margin:15px">
+</center></p>
+
+<p>The Broadcast-Forward strategy sends one complete data set (R) to each parallel instance that holds a partition of the other data set (S), i.e., each parallel instance receives the full data set R. Data set S remains local and is not shipped at all. The cost of the BF strategy depends on the size of R and the number of parallel instances it is shipped to. The size of S does not matter because S is not moved. The figure below illustrates how both ship strategies work. </p>
+
+<p><center>
+<img src="/img/blog/joins-repartition.png" style="width:90%;margin:15px">
+</center></p>
+
+<p>The Repartition-Repartition and Broadcast-Forward ship strategies establish suitable data distributions to execute a distributed join. Depending on the operations that are applied before the join, one or even both inputs of a join are already distributed in a suitable way across parallel instances. In this case, Flink will reuse such distributions and only ship one or no input at all.</p>
+
+<h4 id="flink’s-memory-management">Flink’s Memory Management</h4>
+
+<p>Before delving into the details of Flink’s local join algorithms, I will briefly discuss Flink’s internal memory management. Data processing algorithms such as joining, grouping, and sorting need to hold portions of their input data in memory. While such algorithms perform best if there is enough memory available to hold all data, it is crucial to gracefully handle situations where the data size exceeds memory. Such situations are especially tricky in JVM-based systems such as Flink because the system needs to reliably recognize that it is short on memory. Failure to detect such situations can result in an <code>OutOfMemoryException</code> and kill the JVM. </p>
+
+<p>Flink handles this challenge by actively managing its memory. When a worker node (TaskManager) is started, it allocates a fixed portion (70% by default) of the JVM’s heap memory that is available after initialization as 32KB byte arrays. These byte arrays are distributed as working memory to all algorithms that need to hold significant portions of data in memory. The algorithms receive their input data as Java data objects and serialize them into their working memory.</p>
+
+<p>This design has several nice properties. First, the number of data objects on the JVM heap is much lower resulting in less garbage collection pressure. Second, objects on the heap have a certain space overhead and the binary representation is more compact. Especially data sets of many small elements benefit from that. Third, an algorithm knows exactly when the input data exceeds its working memory and can react by writing some of its filled byte arrays to the worker’s local filesystem. After the content of a byte array is written to disk, it can be reused to process more data. Reading data back into memory is as simple as reading the binary data from the local filesystem. The following figure illustrates Flink’s memory management.</p>
+
+<p><center>
+<img src="/img/blog/joins-memmgmt.png" style="width:90%;margin:15px">
+</center></p>
+
+<p>This active memory management makes Flink extremely robust for processing very large data sets on limited memory resources while preserving all benefits of in-memory processing if data is small enough to fit in-memory. De/serializing data into and from memory has a certain cost overhead compared to simply holding all data elements on the JVM’s heap. However, Flink features efficient custom de/serializers which also allow to perform certain operations such as comparisons directly on serialized data without deserializing data objects from memory.</p>
+
+<h4 id="local-strategies">Local Strategies</h4>
+
+<p>After the data has been distributed across all parallel join instances using either a Repartition-Repartition or Broadcast-Forward ship strategy, each instance runs a local join algorithm to join the elements of its local partition. Flink’s runtime features two common join strategies to perform these local joins:</p>
+
+<ul>
+<li>the <em>Sort-Merge-Join</em> strategy (SM) and </li>
+<li>the <em>Hybrid-Hash-Join</em> strategy (HH).</li>
+</ul>
+
+<p>The Sort-Merge-Join works by first sorting both input data sets on their join key attributes (Sort Phase) and merging the sorted data sets as a second step (Merge Phase). The sort is done in-memory if the local partition of a data set is small enough. Otherwise, an external merge-sort is done by collecting data until the working memory is filled, sorting it, writing the sorted data to the local filesystem, and starting over by filling the working memory again with more incoming data. After all input data has been received, sorted, and written as sorted runs to the local file system, a fully sorted stream can be obtained. This is done by reading the partially sorted runs from the local filesystem and sort-merging the records on the fly. Once the sorted streams of both inputs are available, both streams are sequentially read and merge-joined in a zig-zag fashion by comparing the sorted join key attributes, building join element pairs for matching keys, and advancing the sorted stre
 am with the lower join key. The figure below shows how the Sort-Merge-Join strategy works.</p>
+
+<p><center>
+<img src="/img/blog/joins-smj.png" style="width:90%;margin:15px">
+</center></p>
+
+<p>The Hybrid-Hash-Join distinguishes its inputs as build-side and probe-side input and works in two phases, a build phase followed by a probe phase. In the build phase, the algorithm reads the build-side input and inserts all data elements into an in-memory hash table indexed by their join key attributes. If the hash table outgrows the algorithm&#39;s working memory, parts of the hash table (ranges of hash indexes) are written to the local filesystem. The build phase ends after the build-side input has been fully consumed. In the probe phase, the algorithm reads the probe-side input and probes the hash table for each element using its join key attribute. If the element falls into a hash index range that was spilled to disk, the element is also written to disk. Otherwise, the element is immediately joined with all matching elements from the hash table. If the hash table completely fits into the working memory, the join is finished after the probe-side input has been fully consumed. 
 Otherwise, the current hash table is dropped and a new hash table is built using spilled parts of the build-side input. This hash table is probed by the corresponding parts of the spilled probe-side input. Eventually, all data is joined. Hybrid-Hash-Joins perform best if the hash table completely fits into the working memory because an arbitrarily large the probe-side input can be processed on-the-fly without materializing it. However even if build-side input does not fit into memory, the the Hybrid-Hash-Join has very nice properties. In this case, in-memory processing is partially preserved and only a fraction of the build-side and probe-side data needs to be written to and read from the local filesystem. The next figure illustrates how the Hybrid-Hash-Join works.</p>
+
+<p><center>
+<img src="/img/blog/joins-hhj.png" style="width:90%;margin:15px">
+</center></p>
+
+<h3 id="how-does-flink-choose-join-strategies?">How does Flink choose join strategies?</h3>
+
+<p>Ship and local strategies do not depend on each other and can be independently chosen. Therefore, Flink can execute a join of two data sets R and S in nine different ways by combining any of the three ship strategies (RR, BF with R being broadcasted, BF with S being broadcasted) with any of the three local strategies (SM, HH with R being build-side, HH with S being build-side). Each of these strategy combinations results in different execution performance depending on the data sizes and the available amount of working memory. In case of a small data set R and a much larger data set S, broadcasting R and using it as build-side input of a Hybrid-Hash-Join is usually a good choice because the much larger data set S is not shipped and not materialized (given that the hash table completely fits into memory). If both data sets are rather large or the join is performed on many parallel instances, repartitioning both inputs is a robust choice.</p>
+
+<p>Flink features a cost-based optimizer which automatically chooses the execution strategies for all operators including joins. Without going into the details of cost-based optimization, this is done by computing cost estimates for execution plans with different strategies and picking the plan with the least estimated costs. Thereby, the optimizer estimates the amount of data which is shipped over the the network and written to disk. If no reliable size estimates for the input data can be obtained, the optimizer falls back to robust default choices. A key feature of the optimizer is to reason about existing data properties. For example, if the data of one input is already partitioned in a suitable way, the generated candidate plans will not repartition this input. Hence, the choice of a RR ship strategy becomes more likely. The same applies for previously sorted data and the Sort-Merge-Join strategy. Flink programs can help the optimizer to reason about existing data properties by 
 providing semantic information about  user-defined functions <a href="http://ci.apache.org/projects/flink/flink-docs-master/programming_guide.html#semantic-annotations">[4]</a>. While the optimizer is a killer feature of Flink, it can happen that a user knows better than the optimizer how to execute a specific join. Similar to relational database systems, Flink offers optimizer hints to tell the optimizer which join strategies to pick <a href="http://ci.apache.org/projects/flink/flink-docs-master/dataset_transformations.html#join-algorithm-hints">[5]</a>.</p>
+
+<h3 id="how-is-flink’s-join-performance?">How is Flink’s join performance?</h3>
+
+<p>Alright, that sounds good, but how fast are joins in Flink? Let’s have a look. We start with a benchmark of the single-core performance of Flink’s Hybrid-Hash-Join implementation and run a Flink program that executes a Hybrid-Hash-Join with parallelism 1. We run the program on a n1-standard-2 Google Compute Engine instance (2 vCPUs, 7.5GB memory) with two locally attached SSDs. We give 4GB as working memory to the join. The join program generates 1KB records for both inputs on-the-fly, i.e., the data is not read from disk. We run 1:N (Primary-Key/Foreign-Key) joins and generate the smaller input with unique Integer join keys and the larger input with randomly chosen Integer join keys that fall into the key range of the smaller input. Hence, each tuple of the larger side joins with exactly one tuple of the smaller side. The result of the join is immediately discarded. We vary the size of the build-side input from 1 million to 12 million elements (1GB to 12GB). The probe-
 side input is kept constant at 64 million elements (64GB). The following chart shows the average execution time of three runs for each setup.</p>
+
+<p><center>
+<img src="/img/blog/joins-single-perf.png" style="width:85%;margin:15px">
+</center></p>
+
+<p>The joins with 1 to 3 GB build side (blue bars) are pure in-memory joins. The other joins partially spill data to disk (4 to 12GB, orange bars). The results show that the performance of Flink’s Hybrid-Hash-Join remains stable as long as the hash table completely fits into memory. As soon as the hash table becomes larger than the working memory, parts of the hash table and corresponding parts of the probe side are spilled to disk. The chart shows that the performance of the Hybrid-Hash-Join gracefully decreases in this situation, i.e., there is no sharp increase in runtime when the join starts spilling. In combination with Flink’s robust memory management, this execution behavior gives smooth performance without the need for fine-grained, data-dependent memory tuning.</p>
+
+<p>So, Flink’s Hybrid-Hash-Join implementation performs well on a single thread even for limited memory resources, but how good is Flink’s performance when joining larger data sets in a distributed setting? For the next experiment we compare the performance of the most common join strategy combinations, namely:</p>
+
+<ul>
+<li>Broadcast-Forward, Hybrid-Hash-Join (broadcasting and building with the smaller side),</li>
+<li>Repartition, Hybrid-Hash-Join (building with the smaller side), and</li>
+<li>Repartition, Sort-Merge-Join</li>
+</ul>
+
+<p>for different input size ratios:</p>
+
+<ul>
+<li>1GB     : 1000GB</li>
+<li>10GB    : 1000GB</li>
+<li>100GB   : 1000GB </li>
+<li>1000GB  : 1000GB</li>
+</ul>
+
+<p>The Broadcast-Forward strategy is only executed for up to 10GB. Building a hash table from 100GB broadcasted data in 5GB working memory would result in spilling proximately 95GB (build input) + 950GB (probe input) in each parallel thread and require more than 8TB local disk storage on each machine.</p>
+
+<p>As in the single-core benchmark, we run 1:N joins, generate the data on-the-fly, and immediately discard the result after the join. We run the benchmark on 10 n1-highmem-8 Google Compute Engine instances. Each instance is equipped with 8 cores, 52GB RAM, 40GB of which are configured as working memory (5GB per core), and one local SSD for spilling to disk. All benchmarks are performed using the same configuration, i.e., no fine tuning for the respective data sizes is done. The programs are executed with a parallelism of 80. </p>
+
+<p><center>
+<img src="/img/blog/joins-dist-perf.png" style="width:70%;margin:15px">
+</center></p>
+
+<p>As expected, the Broadcast-Forward strategy performs best for very small inputs because the large probe side is not shipped over the network and is locally joined. However, when the size of the broadcasted side grows, two problems arise. First the amount of data which is shipped increases but also each parallel instance has to process the full broadcasted data set. The performance of both Repartitioning strategies behaves similar for growing input sizes which indicates that these strategies are mainly limited by the cost of the data transfer (at max 2TB are shipped over the network and joined). Although the Sort-Merge-Join strategy shows the worst performance all shown cases, it has a right to exist because it can nicely exploit sorted input data.</p>
+
+<h3 id="i’ve-got-sooo-much-data-to-join,-do-i-really-need-to-ship-it?">I’ve got sooo much data to join, do I really need to ship it?</h3>
+
+<p>We have seen that off-the-shelf distributed joins work really well in Flink. But what if your data is so huge that you do not want to shuffle it across your cluster? We recently added some features to Flink for specifying semantic properties (partitioning and sorting) on input splits and co-located reading of local input files. With these tools at hand, it is possible to join pre-partitioned data sets from your local filesystem without sending a single byte over your cluster’s network. If the input data is even pre-sorted, the join can be done as a Sort-Merge-Join without sorting, i.e., the join is essentially done on-the-fly. Exploiting co-location requires a very special setup though. Data needs to be stored on the local filesystem because HDFS does not feature data co-location and might move file blocks across data nodes. That means you need to take care of many things yourself which HDFS would have done for you, including replication to avoid data loss. On the other hand
 , performance gains of joining co-located and pre-sorted can be quite substantial.</p>
+
+<h3 id="tl;dr:-what-should-i-remember-from-all-of-this?">tl;dr: What should I remember from all of this?</h3>
+
+<ul>
+<li>Flink’s fluent Scala and Java APIs make joins and other data transformations easy as cake.</li>
+<li>The optimizer does the hard choices for you, but gives you control in case you know better.</li>
+<li>Flink’s join implementations perform very good in-memory and gracefully degrade when going to disk. </li>
+<li>Due to Flink’s robust memory management, there is no need for job- or data-specific memory tuning to avoid a nasty <code>OutOfMemoryException</code>. It just runs out-of-the-box.</li>
+</ul>
+
+<h4 id="references">References</h4>
+
+<p>[1] <a href="">“MapReduce: Simplified data processing on large clusters”</a>, Dean, Ghemawat, 2004 <br>
+[2] <a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html">Flink 0.8.1 documentation: Data Transformations</a> <br>
+[3] <a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html#join">Flink 0.8.1 documentation: Joins</a> <br>
+[4] <a href="http://ci.apache.org/projects/flink/flink-docs-master/programming_guide.html#semantic-annotations">Flink 0.9-SNAPSHOT documentation: Semantic annotations</a> <br>
+[5] <a href="http://ci.apache.org/projects/flink/flink-docs-master/dataset_transformations.html#join-algorithm-hints">Flink 0.9-SNAPSHOT documentation: Optimizer join hints</a> <br></p>
+
+<p><br>
+<small>Written by Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>).</small></p>
+</div>
+				<a href="/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html#disqus_thread">Peeking into Apache Flink's Engine Room</a>
+			</article>
+			
+			<article>
 				<h2><a href="/news/2015/03/02/february-2015-in-flink.html">February 2015 in the Flink community</a></h2>
 				<p class="meta">02 Mar 2015</p>
 
@@ -1351,89 +1542,6 @@ of the system. We suggest all users of F
 				<a href="/news/2014/09/26/release-0.6.1.html#disqus_thread">Apache Flink 0.6.1 available</a>
 			</article>
 			
-			<article>
-				<h2><a href="/news/2014/08/26/release-0.6.html">Apache Flink 0.6 available</a></h2>
-				<p class="meta">26 Aug 2014</p>
-
-				<div><p>We are happy to announce the availability of Flink 0.6. This is the
-first release of the system inside the Apache Incubator and under the
-name Flink. Releases up to 0.5 were under the name Stratosphere, the
-academic and open source project that Flink originates from.</p>
-
-<h2 id="what-is-flink?">What is Flink?</h2>
-
-<p>Apache Flink is a general-purpose data processing engine for
-clusters. It runs on YARN clusters on top of data stored in Hadoop, as
-well as stand-alone. Flink currently has programming APIs in Java and
-Scala. Jobs are executed via Flink&#39;s own runtime engine. Flink
-features:</p>
-
-<p><strong>Robust in-memory and out-of-core processing:</strong> once read, data stays
-  in memory as much as possible, and is gracefully de-staged to disk in
-  the presence of memory pressure from limited memory or other
-  applications. The runtime is designed to perform very well both in
-  setups with abundant memory and in setups where memory is scarce.</p>
-
-<p><strong>POJO-based APIs:</strong> when programming, you do not have to pack your
-  data into key-value pairs or some other framework-specific data
-  model. Rather, you can use arbitrary Java and Scala types to model
-  your data.</p>
-
-<p><strong>Efficient iterative processing:</strong> Flink contains explicit &quot;iterate&quot; operators
-  that enable very efficient loops over data sets, e.g., for machine
-  learning and graph applications.</p>
-
-<p><strong>A modular system stack:</strong> Flink is not a direct implementation of its
-  APIs but a layered system. All programming APIs are translated to an
-  intermediate program representation that is compiled and optimized
-  via a cost-based optimizer. Lower-level layers of Flink also expose
-  programming APIs for extending the system.</p>
-
-<p><strong>Data pipelining/streaming:</strong> Flink&#39;s runtime is designed as a
-  pipelined data processing engine rather than a batch processing
-  engine. Operators do not wait for their predecessors to finish in
-  order to start processing data. This results to very efficient
-  handling of large data sets.</p>
-
-<h2 id="release-0.6">Release 0.6</h2>
-
-<p>Flink 0.6 builds on the latest Stratosphere 0.5 release. It includes
-many bug fixes and improvements that make the system more stable and
-robust, as well as breaking API changes.</p>
-
-<p>The full release notes are available <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;version=12327101">here</a>.</p>
-
-<p>Download the release <a href="http://flink.incubator.apache.org/downloads.html">here</a>.</p>
-
-<h2 id="contributors">Contributors</h2>
-
-<ul>
-<li>Wilson Cao</li>
-<li>Ufuk Celebi</li>
-<li>Stephan Ewen</li>
-<li>Jonathan Hasenburg</li>
-<li>Markus Holzemer</li>
-<li>Fabian Hueske</li>
-<li>Sebastian Kunert</li>
-<li>Vikhyat Korrapati</li>
-<li>Aljoscha Krettek</li>
-<li>Sebastian Kruse</li>
-<li>Raymond Liu</li>
-<li>Robert Metzger</li>
-<li>Mingliang Qi</li>
-<li>Till Rohrmann</li>
-<li>Henry Saputra</li>
-<li>Chesnay Schepler</li>
-<li>Kostas Tzoumas</li>
-<li>Robert Waury</li>
-<li>Timo Walther</li>
-<li>Daniel Warneke</li>
-<li>Tobias Wiens</li>
-</ul>
-</div>
-				<a href="/news/2014/08/26/release-0.6.html#disqus_thread">Apache Flink 0.6 available</a>
-			</article>
-			
 		</div>
 		<div class="col-md-2"></div>
 	</div>

Modified: flink/site/blog/page2/index.html
URL: http://svn.apache.org/viewvc/flink/site/blog/page2/index.html?rev=1667227&r1=1667226&r2=1667227&view=diff
==============================================================================
--- flink/site/blog/page2/index.html (original)
+++ flink/site/blog/page2/index.html Tue Mar 17 10:05:20 2015
@@ -134,6 +134,89 @@
 		<div class="col-md-8">
 			
 			<article>
+				<h2><a href="/news/2014/08/26/release-0.6.html">Apache Flink 0.6 available</a></h2>
+				<p class="meta">26 Aug 2014</p>
+
+				<div><p>We are happy to announce the availability of Flink 0.6. This is the
+first release of the system inside the Apache Incubator and under the
+name Flink. Releases up to 0.5 were under the name Stratosphere, the
+academic and open source project that Flink originates from.</p>
+
+<h2 id="what-is-flink?">What is Flink?</h2>
+
+<p>Apache Flink is a general-purpose data processing engine for
+clusters. It runs on YARN clusters on top of data stored in Hadoop, as
+well as stand-alone. Flink currently has programming APIs in Java and
+Scala. Jobs are executed via Flink&#39;s own runtime engine. Flink
+features:</p>
+
+<p><strong>Robust in-memory and out-of-core processing:</strong> once read, data stays
+  in memory as much as possible, and is gracefully de-staged to disk in
+  the presence of memory pressure from limited memory or other
+  applications. The runtime is designed to perform very well both in
+  setups with abundant memory and in setups where memory is scarce.</p>
+
+<p><strong>POJO-based APIs:</strong> when programming, you do not have to pack your
+  data into key-value pairs or some other framework-specific data
+  model. Rather, you can use arbitrary Java and Scala types to model
+  your data.</p>
+
+<p><strong>Efficient iterative processing:</strong> Flink contains explicit &quot;iterate&quot; operators
+  that enable very efficient loops over data sets, e.g., for machine
+  learning and graph applications.</p>
+
+<p><strong>A modular system stack:</strong> Flink is not a direct implementation of its
+  APIs but a layered system. All programming APIs are translated to an
+  intermediate program representation that is compiled and optimized
+  via a cost-based optimizer. Lower-level layers of Flink also expose
+  programming APIs for extending the system.</p>
+
+<p><strong>Data pipelining/streaming:</strong> Flink&#39;s runtime is designed as a
+  pipelined data processing engine rather than a batch processing
+  engine. Operators do not wait for their predecessors to finish in
+  order to start processing data. This results to very efficient
+  handling of large data sets.</p>
+
+<h2 id="release-0.6">Release 0.6</h2>
+
+<p>Flink 0.6 builds on the latest Stratosphere 0.5 release. It includes
+many bug fixes and improvements that make the system more stable and
+robust, as well as breaking API changes.</p>
+
+<p>The full release notes are available <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;version=12327101">here</a>.</p>
+
+<p>Download the release <a href="http://flink.incubator.apache.org/downloads.html">here</a>.</p>
+
+<h2 id="contributors">Contributors</h2>
+
+<ul>
+<li>Wilson Cao</li>
+<li>Ufuk Celebi</li>
+<li>Stephan Ewen</li>
+<li>Jonathan Hasenburg</li>
+<li>Markus Holzemer</li>
+<li>Fabian Hueske</li>
+<li>Sebastian Kunert</li>
+<li>Vikhyat Korrapati</li>
+<li>Aljoscha Krettek</li>
+<li>Sebastian Kruse</li>
+<li>Raymond Liu</li>
+<li>Robert Metzger</li>
+<li>Mingliang Qi</li>
+<li>Till Rohrmann</li>
+<li>Henry Saputra</li>
+<li>Chesnay Schepler</li>
+<li>Kostas Tzoumas</li>
+<li>Robert Waury</li>
+<li>Timo Walther</li>
+<li>Daniel Warneke</li>
+<li>Tobias Wiens</li>
+</ul>
+</div>
+				<a href="/news/2014/08/26/release-0.6.html#disqus_thread">Apache Flink 0.6 available</a>
+			</article>
+			
+			<article>
 				<h2><a href="/news/2014/05/31/release-0.5.html">Stratosphere version 0.5 available</a></h2>
 				<p class="meta">31 May 2014</p>
 
@@ -749,26 +832,6 @@ For a complete overview of the renamings
 				<a href="/news/2014/01/10/stratosphere-hadoop-summit.html#disqus_thread">Stratosphere got accepted to the Hadoop Summit Europe in Amsterdam</a>
 			</article>
 			
-			<article>
-				<h2><a href="/news/2013/12/13/humboldt-innovation-award.html">Stratosphere wins award at Humboldt Innovation Competition "Big Data: Research meets Startups"</a></h2>
-				<p class="meta">13 Dec 2013</p>
-
-				<div>    <p> Stratosphere won the second place in
-    the <a href="http://www.humboldt-innovation.de/de/newsdetail/News/View/Forum%2BJunge%2BSpitzenforscher%2BBIG%2BData%2B%2BResearch%2Bmeets%2BStartups-123.html">competition</a>
-    organized by Humboldt Innovation on "Big Data: Research meets
-    Startups," where several research projects were evaluated by a
-    panel of experts from the Berlin startup ecosystem. The award
-    includes a monetary prize of 10,000 euros.
-    </p>
-
-    <p>We are extremely excited about this award, as it further
-    showcases the relevance of the Stratosphere platform and Big Data
-    technology in general for the technology startup world.
-    </p>
-</div>
-				<a href="/news/2013/12/13/humboldt-innovation-award.html#disqus_thread">Stratosphere wins award at Humboldt Innovation Competition "Big Data: Research meets Startups"</a>
-			</article>
-			
 		</div>
 		<div class="col-md-2"></div>
 	</div>

Modified: flink/site/blog/page3/index.html
URL: http://svn.apache.org/viewvc/flink/site/blog/page3/index.html?rev=1667227&r1=1667226&r2=1667227&view=diff
==============================================================================
--- flink/site/blog/page3/index.html (original)
+++ flink/site/blog/page3/index.html Tue Mar 17 10:05:20 2015
@@ -134,6 +134,26 @@
 		<div class="col-md-8">
 			
 			<article>
+				<h2><a href="/news/2013/12/13/humboldt-innovation-award.html">Stratosphere wins award at Humboldt Innovation Competition "Big Data: Research meets Startups"</a></h2>
+				<p class="meta">13 Dec 2013</p>
+
+				<div>    <p> Stratosphere won the second place in
+    the <a href="http://www.humboldt-innovation.de/de/newsdetail/News/View/Forum%2BJunge%2BSpitzenforscher%2BBIG%2BData%2B%2BResearch%2Bmeets%2BStartups-123.html">competition</a>
+    organized by Humboldt Innovation on "Big Data: Research meets
+    Startups," where several research projects were evaluated by a
+    panel of experts from the Berlin startup ecosystem. The award
+    includes a monetary prize of 10,000 euros.
+    </p>
+
+    <p>We are extremely excited about this award, as it further
+    showcases the relevance of the Stratosphere platform and Big Data
+    technology in general for the technology startup world.
+    </p>
+</div>
+				<a href="/news/2013/12/13/humboldt-innovation-award.html#disqus_thread">Stratosphere wins award at Humboldt Innovation Competition "Big Data: Research meets Startups"</a>
+			</article>
+			
+			<article>
 				<h2><a href="/news/2013/10/21/cikm2013-paper.html">Paper "“All Roads Lead to Rome:” Optimistic Recovery for Distributed Iterative Data Processing" accepted at CIKM 2013</a></h2>
 				<p class="meta">21 Oct 2013</p>
 

Added: flink/site/img/blog/joins-broadcast.png
URL: http://svn.apache.org/viewvc/flink/site/img/blog/joins-broadcast.png?rev=1667227&view=auto
==============================================================================
Binary file - no diff available.

Propchange: flink/site/img/blog/joins-broadcast.png
------------------------------------------------------------------------------
    svn:executable = *

Propchange: flink/site/img/blog/joins-broadcast.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: flink/site/img/blog/joins-dist-perf.png
URL: http://svn.apache.org/viewvc/flink/site/img/blog/joins-dist-perf.png?rev=1667227&view=auto
==============================================================================
Binary file - no diff available.

Propchange: flink/site/img/blog/joins-dist-perf.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: flink/site/img/blog/joins-hhj.png
URL: http://svn.apache.org/viewvc/flink/site/img/blog/joins-hhj.png?rev=1667227&view=auto
==============================================================================
Binary file - no diff available.

Propchange: flink/site/img/blog/joins-hhj.png
------------------------------------------------------------------------------
    svn:executable = *

Propchange: flink/site/img/blog/joins-hhj.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: flink/site/img/blog/joins-memmgmt.png
URL: http://svn.apache.org/viewvc/flink/site/img/blog/joins-memmgmt.png?rev=1667227&view=auto
==============================================================================
Binary file - no diff available.

Propchange: flink/site/img/blog/joins-memmgmt.png
------------------------------------------------------------------------------
    svn:executable = *

Propchange: flink/site/img/blog/joins-memmgmt.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: flink/site/img/blog/joins-repartition.png
URL: http://svn.apache.org/viewvc/flink/site/img/blog/joins-repartition.png?rev=1667227&view=auto
==============================================================================
Binary file - no diff available.

Propchange: flink/site/img/blog/joins-repartition.png
------------------------------------------------------------------------------
    svn:executable = *

Propchange: flink/site/img/blog/joins-repartition.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: flink/site/img/blog/joins-single-perf.png
URL: http://svn.apache.org/viewvc/flink/site/img/blog/joins-single-perf.png?rev=1667227&view=auto
==============================================================================
Binary file - no diff available.

Propchange: flink/site/img/blog/joins-single-perf.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: flink/site/img/blog/joins-smj.png
URL: http://svn.apache.org/viewvc/flink/site/img/blog/joins-smj.png?rev=1667227&view=auto
==============================================================================
Binary file - no diff available.

Propchange: flink/site/img/blog/joins-smj.png
------------------------------------------------------------------------------
    svn:executable = *

Propchange: flink/site/img/blog/joins-smj.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: flink/site/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html
URL: http://svn.apache.org/viewvc/flink/site/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html?rev=1667227&view=auto
==============================================================================
--- flink/site/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html (added)
+++ flink/site/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html Tue Mar 17 10:05:20 2015
@@ -0,0 +1,475 @@
+<!DOCTYPE html>
+<html lang="en">
+    <head>
+	    <meta charset="utf-8">
+	    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+	    <meta name="viewport" content="width=device-width, initial-scale=1">
+
+	    <title>Apache Flink: Peeking into Apache Flink's Engine Room</title>
+	    <link rel="shortcut icon" href="favicon.ico" type="image/x-icon">
+	    <link rel="icon" href="favicon.ico" type="image/x-icon">
+	    <link rel="stylesheet" href="/css/bootstrap.css">
+	    <link rel="stylesheet" href="/css/bootstrap-lumen-custom.css">
+	    <link rel="stylesheet" href="/css/syntax.css">
+	    <link rel="stylesheet" href="/css/custom.css">
+	    <link href="/css/main/main.css" rel="stylesheet">
+            <link href="/blog/feed.xml" rel="alternate" type="application/rss+xml" title="Flink Blog RSS feed" />
+	    <!-- <link href="//maxcdn.bootstrapcdn.com/font-awesome/4.1.0/css/font-awesome.min.css" rel="stylesheet"> -->
+	    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js"></script>
+	    <script src="/js/bootstrap.min.js"></script>
+	    <script src="/js/codetabs.js"></script>
+    </head>
+    <body>
+    <div class="af-header-container af-inner-pages-navigation">
+	<header>
+		<div class="container">
+			<div class="row">
+				<div class="col-md-1 af-mobile-nav-bar">
+					<a href="/" title="Home">
+					<img class="hidden-xs hidden-sm img-responsive"
+						src="/img/main/logo.png" alt="Apache Flink Logo">
+					</a>
+					<div class="row visible-xs">
+						<div class="col-xs-3">
+						    <a href="/" title="Home">
+							<img class="hidden-x hidden-sm img-responsive"
+								src="/img/main/logo.png" alt="Apache Flink Logo">
+							</a>
+						</div>
+						<div class="col-xs-5"></div>
+						<div class="col-xs-4">
+							<div class="af-mobile-btn">
+								<span class="glyphicon glyphicon-plus"></span>
+							</div>
+						</div>
+					</div>
+				</div>
+				<!-- Navigation -->
+				<div class="col-md-11">
+					<nav class="af-main-nav" role="navigation">
+						<ul>
+							<li><a href="#" class="af-nav-links">Quickstart
+									<b class="caret"></b>
+							</a>
+								<ul class="af-dropdown-menu">
+									<li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/setup_quickstart.html">Setup
+											Flink</a></li>
+									<li><a
+										href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/java_api_quickstart.html">Java
+											API</a></li>
+									<li><a
+										href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/scala_api_quickstart.html">Scala
+											API</a></li>
+								</ul></li>
+							<li><a href="/downloads.html">Download</a></li>
+							<li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/faq.html">FAQ</a></li>
+							<li><a href="#" class="af-nav-links">Documentation <b
+									class="caret"></b></a>
+							  <ul class="af-dropdown-menu">
+                                                            		<li class="af-separator">Current Snapshot:</li>
+									<li></li>
+									<li><a href="http://ci.apache.org/projects/flink/flink-docs-master/">0.9</a></li>
+									<li><a href="http://ci.apache.org/projects/flink/flink-docs-master/api/java">0.9 Javadocs</a></li>
+									<li><a href="http://ci.apache.org/projects/flink/flink-docs-master/api/scala/index.html#org.apache.flink.api.scala.package">0.9 Scaladocs</a></li>
+									<li class="divider"></li>
+									<li class="af-separator">Current Stable:</li>
+									<li></li>
+									<li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/">0.8.1</a></li>
+									<li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/api/java">0.8.1 Javadocs</a></li>
+									<li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/api/scala/index.html#org.apache.flink.api.scala.package">0.8.1 Scaladocs</a></li>
+									<li class="divider"></li>
+									<li></li>
+									<li><a href="/archive.html">Archive</a></li>
+									<li></li>
+								</ul></li>
+							<li><a href="#" class="af-nav-links">Community <b
+									class="caret"></b></a>
+								<ul class="af-dropdown-menu">
+									<li><a href="/community.html#mailing-lists">Mailing
+											Lists</a></li>
+									<li><a href="/community.html#issues">Issues</a></li>
+									<li><a href="/community.html#team">Team</a></li>
+									<li class="divider"></li>
+									<li><a href="/how-to-contribute.html">How To
+											Contribute</a></li>
+									<li><a href="/coding_guidelines.html">Coding
+											Guidelines</a></li>
+								</ul></li>
+							<li><a href="#" class="af-nav-links">Project <b
+									class="caret"></b></a>
+								<ul class="af-dropdown-menu">
+									<li><a href="/material.html">Material</a></li>
+									<li><a href="http://www.apache.org/">Apache Software
+											Foundation <span class="glyphicon glyphicon-new-window"></span>
+									</a></li>
+									<li><a
+										href="https://cwiki.apache.org/confluence/display/FLINK">Wiki
+											<span class="glyphicon glyphicon-new-window"></span>
+									</a></li>
+									<li><a
+										href="https://wiki.apache.org/incubator/StratosphereProposal">Incubator
+											Proposal <span class="glyphicon glyphicon-new-window"></span>
+									</a></li>
+									<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License
+											<span class="glyphicon glyphicon-new-window"></span>
+									</a></li>
+									<li><a href="https://github.com/apache/incubator-flink">Source
+											Code <span class="glyphicon glyphicon-new-window"></span>
+									</a></li>
+								</ul></li>
+							<li><a href="/blog/index.html" class="">Blog</a></li>
+						</ul>
+					</nav>
+				</div>
+			</div>
+		</div>
+	</header>
+</div>
+
+
+    <div style="padding-top:120px" class="container">
+        <div class="container">
+    <div class="row">
+		<div class="col-md-2"></div>
+		<div class="col-md-8">
+			<article>
+				<h2>Peeking into Apache Flink's Engine Room</h2>
+				    <p class="meta">13 Mar 2015</p>
+				<div>
+				    <h3 id="join-processing-in-apache-flink">Join Processing in Apache Flink</h3>
+
+<p>Joins are prevalent operations in many data processing applications. Most data processing systems feature APIs that make joining data sets very easy. However, the internal algorithms for join processing are much more involved especially if large data sets need to be efficiently handled. Therefore, join processing serves as a good example to discuss the salient design points and implementation details of a data processing system.</p>
+
+<p>In this blog post, we cut through Apache Flink’s layered architecture and take a look at its internals with a focus on how it handles joins. Specifically, I will</p>
+
+<ul>
+<li>show how easy it is to join data sets using Flink’s fluent APIs, </li>
+<li>discuss basic distributed join strategies, Flink’s join implementations, and its memory management,</li>
+<li>talk about Flink’s optimizer that automatically chooses join strategies,</li>
+<li>show some performance numbers for joining data sets of different sizes, and finally</li>
+<li>briefly discuss joining of co-located and pre-sorted data sets.</li>
+</ul>
+
+<p><em>Disclaimer</em>: This blog post is exclusively about equi-joins. Whenever I say “join” in the following, I actually mean “equi-join”.</p>
+
+<h3 id="how-do-i-join-with-flink?">How do I join with Flink?</h3>
+
+<p>Flink provides fluent APIs in Java and Scala to write data flow programs. Flink’s APIs are centered around parallel data collections which are called data sets. data sets are processed by applying Transformations that compute new data sets. Flink’s transformations include Map and Reduce as known from MapReduce <a href="http://research.google.com/archive/mapreduce.html">[1]</a> but also operators for joining, co-grouping, and iterative processing. The documentation gives an overview of all available transformations <a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html">[2]</a>. </p>
+
+<p>Joining two Scala case class data sets is very easy as the following example shows:</p>
+<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// define your data types</span>
+<span class="k">case</span> <span class="k">class</span> <span class="nc">PageVisit</span><span class="o">(</span><span class="n">url</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">ip</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">userId</span><span class="k">:</span> <span class="kt">Long</span><span class="o">)</span>
+<span class="k">case</span> <span class="k">class</span> <span class="nc">User</span><span class="o">(</span><span class="n">id</span><span class="k">:</span> <span class="kt">Long</span><span class="o">,</span> <span class="n">name</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">email</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">country</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span>
+
+<span class="c1">// get your data from somewhere</span>
+<span class="k">val</span> <span class="n">visits</span><span class="k">:</span> <span class="kt">DataSet</span><span class="o">[</span><span class="kt">PageVisit</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
+<span class="k">val</span> <span class="n">users</span><span class="k">:</span> <span class="kt">DataSet</span><span class="o">[</span><span class="kt">User</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
+
+<span class="c1">// filter the users data set</span>
+<span class="k">val</span> <span class="n">germanUsers</span> <span class="k">=</span> <span class="n">users</span><span class="o">.</span><span class="n">filter</span><span class="o">((</span><span class="n">u</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">u</span><span class="o">.</span><span class="n">country</span><span class="o">.</span><span class="n">equals</span><span class="o">(</span><span class="s">&quot;de&quot;</span><span class="o">))</span>
+<span class="c1">// join data sets</span>
+<span class="k">val</span> <span class="n">germanVisits</span><span class="k">:</span> <span class="kt">DataSet</span><span class="o">[(</span><span class="kt">PageVisit</span>, <span class="kt">User</span><span class="o">)]</span> <span class="k">=</span>
+      <span class="c1">// equi-join condition (PageVisit.userId = User.id)</span>
+     <span class="n">visits</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">germanUsers</span><span class="o">).</span><span class="n">where</span><span class="o">(</span><span class="s">&quot;userId&quot;</span><span class="o">).</span><span class="n">equalTo</span><span class="o">(</span><span class="s">&quot;id&quot;</span><span class="o">)</span>
+</code></pre></div>
+<p>Flink’s APIs also allow to:</p>
+
+<ul>
+<li>apply a user-defined join function to each pair of joined elements instead returning a <code>($Left, $Right)</code> tuple,</li>
+<li>select fields of pairs of joined Tuple elements (projection), and</li>
+<li>define composite join keys such as <code>.where(“orderDate”, “zipCode”).equalTo(“date”, “zip”)</code>.</li>
+</ul>
+
+<p>See the documentation for more details on Flink’s join features <a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html#join">[3]</a>.</p>
+
+<h3 id="how-does-flink-join-my-data?">How does Flink join my data?</h3>
+
+<p>Flink uses techniques which are well known from parallel database systems to efficiently execute parallel joins. A join operator must establish all pairs of elements from its input data sets for which the join condition evaluates to true. In a standalone system, the most straight-forward implementation of a join is the so-called nested-loop join which builds the full Cartesian product and evaluates the join condition for each pair of elements. This strategy has quadratic complexity and does obviously not scale to large inputs.</p>
+
+<p>In a distributed system joins are commonly processed in two steps:</p>
+
+<ol>
+<li>The data of both inputs is distributed across all parallel instances that participate in the join and</li>
+<li>each parallel instance performs a standard stand-alone join algorithm on its local partition of the overall data. </li>
+</ol>
+
+<p>The distribution of data across parallel instances must ensure that each valid join pair can be locally built by exactly one instance. For both steps, there are multiple valid strategies that can be independently picked and which are favorable in different situations. In Flink terminology, the first phase is called Ship Strategy and the second phase Local Strategy. In the following I will describe Flink’s ship and local strategies to join two data sets <em>R</em> and <em>S</em>.</p>
+
+<h4 id="ship-strategies">Ship Strategies</h4>
+
+<p>Flink features two ship strategies to establish a valid data partitioning for a join:</p>
+
+<ul>
+<li>the <em>Repartition-Repartition</em> strategy (RR) and</li>
+<li>the <em>Broadcast-Forward</em> strategy (BF).</li>
+</ul>
+
+<p>The Repartition-Repartition strategy partitions both inputs, R and S, on their join key attributes using the same partitioning function. Each partition is assigned to exactly one parallel join instance and all data of that partition is sent to its associated instance. This ensures that all elements that share the same join key are shipped to the same parallel instance and can be locally joined. The cost of the RR strategy is a full shuffle of both data sets over the network.</p>
+
+<p><center>
+<img src="/img/blog/joins-broadcast.png" style="width:90%;margin:15px">
+</center></p>
+
+<p>The Broadcast-Forward strategy sends one complete data set (R) to each parallel instance that holds a partition of the other data set (S), i.e., each parallel instance receives the full data set R. Data set S remains local and is not shipped at all. The cost of the BF strategy depends on the size of R and the number of parallel instances it is shipped to. The size of S does not matter because S is not moved. The figure below illustrates how both ship strategies work. </p>
+
+<p><center>
+<img src="/img/blog/joins-repartition.png" style="width:90%;margin:15px">
+</center></p>
+
+<p>The Repartition-Repartition and Broadcast-Forward ship strategies establish suitable data distributions to execute a distributed join. Depending on the operations that are applied before the join, one or even both inputs of a join are already distributed in a suitable way across parallel instances. In this case, Flink will reuse such distributions and only ship one or no input at all.</p>
+
+<h4 id="flink’s-memory-management">Flink’s Memory Management</h4>
+
+<p>Before delving into the details of Flink’s local join algorithms, I will briefly discuss Flink’s internal memory management. Data processing algorithms such as joining, grouping, and sorting need to hold portions of their input data in memory. While such algorithms perform best if there is enough memory available to hold all data, it is crucial to gracefully handle situations where the data size exceeds memory. Such situations are especially tricky in JVM-based systems such as Flink because the system needs to reliably recognize that it is short on memory. Failure to detect such situations can result in an <code>OutOfMemoryException</code> and kill the JVM. </p>
+
+<p>Flink handles this challenge by actively managing its memory. When a worker node (TaskManager) is started, it allocates a fixed portion (70% by default) of the JVM’s heap memory that is available after initialization as 32KB byte arrays. These byte arrays are distributed as working memory to all algorithms that need to hold significant portions of data in memory. The algorithms receive their input data as Java data objects and serialize them into their working memory.</p>
+
+<p>This design has several nice properties. First, the number of data objects on the JVM heap is much lower resulting in less garbage collection pressure. Second, objects on the heap have a certain space overhead and the binary representation is more compact. Especially data sets of many small elements benefit from that. Third, an algorithm knows exactly when the input data exceeds its working memory and can react by writing some of its filled byte arrays to the worker’s local filesystem. After the content of a byte array is written to disk, it can be reused to process more data. Reading data back into memory is as simple as reading the binary data from the local filesystem. The following figure illustrates Flink’s memory management.</p>
+
+<p><center>
+<img src="/img/blog/joins-memmgmt.png" style="width:90%;margin:15px">
+</center></p>
+
+<p>This active memory management makes Flink extremely robust for processing very large data sets on limited memory resources while preserving all benefits of in-memory processing if data is small enough to fit in-memory. De/serializing data into and from memory has a certain cost overhead compared to simply holding all data elements on the JVM’s heap. However, Flink features efficient custom de/serializers which also allow to perform certain operations such as comparisons directly on serialized data without deserializing data objects from memory.</p>
+
+<h4 id="local-strategies">Local Strategies</h4>
+
+<p>After the data has been distributed across all parallel join instances using either a Repartition-Repartition or Broadcast-Forward ship strategy, each instance runs a local join algorithm to join the elements of its local partition. Flink’s runtime features two common join strategies to perform these local joins:</p>
+
+<ul>
+<li>the <em>Sort-Merge-Join</em> strategy (SM) and </li>
+<li>the <em>Hybrid-Hash-Join</em> strategy (HH).</li>
+</ul>
+
+<p>The Sort-Merge-Join works by first sorting both input data sets on their join key attributes (Sort Phase) and merging the sorted data sets as a second step (Merge Phase). The sort is done in-memory if the local partition of a data set is small enough. Otherwise, an external merge-sort is done by collecting data until the working memory is filled, sorting it, writing the sorted data to the local filesystem, and starting over by filling the working memory again with more incoming data. After all input data has been received, sorted, and written as sorted runs to the local file system, a fully sorted stream can be obtained. This is done by reading the partially sorted runs from the local filesystem and sort-merging the records on the fly. Once the sorted streams of both inputs are available, both streams are sequentially read and merge-joined in a zig-zag fashion by comparing the sorted join key attributes, building join element pairs for matching keys, and advancing the sorted stre
 am with the lower join key. The figure below shows how the Sort-Merge-Join strategy works.</p>
+
+<p><center>
+<img src="/img/blog/joins-smj.png" style="width:90%;margin:15px">
+</center></p>
+
+<p>The Hybrid-Hash-Join distinguishes its inputs as build-side and probe-side input and works in two phases, a build phase followed by a probe phase. In the build phase, the algorithm reads the build-side input and inserts all data elements into an in-memory hash table indexed by their join key attributes. If the hash table outgrows the algorithm&#39;s working memory, parts of the hash table (ranges of hash indexes) are written to the local filesystem. The build phase ends after the build-side input has been fully consumed. In the probe phase, the algorithm reads the probe-side input and probes the hash table for each element using its join key attribute. If the element falls into a hash index range that was spilled to disk, the element is also written to disk. Otherwise, the element is immediately joined with all matching elements from the hash table. If the hash table completely fits into the working memory, the join is finished after the probe-side input has been fully consumed. 
 Otherwise, the current hash table is dropped and a new hash table is built using spilled parts of the build-side input. This hash table is probed by the corresponding parts of the spilled probe-side input. Eventually, all data is joined. Hybrid-Hash-Joins perform best if the hash table completely fits into the working memory because an arbitrarily large the probe-side input can be processed on-the-fly without materializing it. However even if build-side input does not fit into memory, the the Hybrid-Hash-Join has very nice properties. In this case, in-memory processing is partially preserved and only a fraction of the build-side and probe-side data needs to be written to and read from the local filesystem. The next figure illustrates how the Hybrid-Hash-Join works.</p>
+
+<p><center>
+<img src="/img/blog/joins-hhj.png" style="width:90%;margin:15px">
+</center></p>
+
+<h3 id="how-does-flink-choose-join-strategies?">How does Flink choose join strategies?</h3>
+
+<p>Ship and local strategies do not depend on each other and can be independently chosen. Therefore, Flink can execute a join of two data sets R and S in nine different ways by combining any of the three ship strategies (RR, BF with R being broadcasted, BF with S being broadcasted) with any of the three local strategies (SM, HH with R being build-side, HH with S being build-side). Each of these strategy combinations results in different execution performance depending on the data sizes and the available amount of working memory. In case of a small data set R and a much larger data set S, broadcasting R and using it as build-side input of a Hybrid-Hash-Join is usually a good choice because the much larger data set S is not shipped and not materialized (given that the hash table completely fits into memory). If both data sets are rather large or the join is performed on many parallel instances, repartitioning both inputs is a robust choice.</p>
+
+<p>Flink features a cost-based optimizer which automatically chooses the execution strategies for all operators including joins. Without going into the details of cost-based optimization, this is done by computing cost estimates for execution plans with different strategies and picking the plan with the least estimated costs. Thereby, the optimizer estimates the amount of data which is shipped over the the network and written to disk. If no reliable size estimates for the input data can be obtained, the optimizer falls back to robust default choices. A key feature of the optimizer is to reason about existing data properties. For example, if the data of one input is already partitioned in a suitable way, the generated candidate plans will not repartition this input. Hence, the choice of a RR ship strategy becomes more likely. The same applies for previously sorted data and the Sort-Merge-Join strategy. Flink programs can help the optimizer to reason about existing data properties by 
 providing semantic information about  user-defined functions <a href="http://ci.apache.org/projects/flink/flink-docs-master/programming_guide.html#semantic-annotations">[4]</a>. While the optimizer is a killer feature of Flink, it can happen that a user knows better than the optimizer how to execute a specific join. Similar to relational database systems, Flink offers optimizer hints to tell the optimizer which join strategies to pick <a href="http://ci.apache.org/projects/flink/flink-docs-master/dataset_transformations.html#join-algorithm-hints">[5]</a>.</p>
+
+<h3 id="how-is-flink’s-join-performance?">How is Flink’s join performance?</h3>
+
+<p>Alright, that sounds good, but how fast are joins in Flink? Let’s have a look. We start with a benchmark of the single-core performance of Flink’s Hybrid-Hash-Join implementation and run a Flink program that executes a Hybrid-Hash-Join with parallelism 1. We run the program on a n1-standard-2 Google Compute Engine instance (2 vCPUs, 7.5GB memory) with two locally attached SSDs. We give 4GB as working memory to the join. The join program generates 1KB records for both inputs on-the-fly, i.e., the data is not read from disk. We run 1:N (Primary-Key/Foreign-Key) joins and generate the smaller input with unique Integer join keys and the larger input with randomly chosen Integer join keys that fall into the key range of the smaller input. Hence, each tuple of the larger side joins with exactly one tuple of the smaller side. The result of the join is immediately discarded. We vary the size of the build-side input from 1 million to 12 million elements (1GB to 12GB). The probe-
 side input is kept constant at 64 million elements (64GB). The following chart shows the average execution time of three runs for each setup.</p>
+
+<p><center>
+<img src="/img/blog/joins-single-perf.png" style="width:85%;margin:15px">
+</center></p>
+
+<p>The joins with 1 to 3 GB build side (blue bars) are pure in-memory joins. The other joins partially spill data to disk (4 to 12GB, orange bars). The results show that the performance of Flink’s Hybrid-Hash-Join remains stable as long as the hash table completely fits into memory. As soon as the hash table becomes larger than the working memory, parts of the hash table and corresponding parts of the probe side are spilled to disk. The chart shows that the performance of the Hybrid-Hash-Join gracefully decreases in this situation, i.e., there is no sharp increase in runtime when the join starts spilling. In combination with Flink’s robust memory management, this execution behavior gives smooth performance without the need for fine-grained, data-dependent memory tuning.</p>
+
+<p>So, Flink’s Hybrid-Hash-Join implementation performs well on a single thread even for limited memory resources, but how good is Flink’s performance when joining larger data sets in a distributed setting? For the next experiment we compare the performance of the most common join strategy combinations, namely:</p>
+
+<ul>
+<li>Broadcast-Forward, Hybrid-Hash-Join (broadcasting and building with the smaller side),</li>
+<li>Repartition, Hybrid-Hash-Join (building with the smaller side), and</li>
+<li>Repartition, Sort-Merge-Join</li>
+</ul>
+
+<p>for different input size ratios:</p>
+
+<ul>
+<li>1GB     : 1000GB</li>
+<li>10GB    : 1000GB</li>
+<li>100GB   : 1000GB </li>
+<li>1000GB  : 1000GB</li>
+</ul>
+
+<p>The Broadcast-Forward strategy is only executed for up to 10GB. Building a hash table from 100GB broadcasted data in 5GB working memory would result in spilling proximately 95GB (build input) + 950GB (probe input) in each parallel thread and require more than 8TB local disk storage on each machine.</p>
+
+<p>As in the single-core benchmark, we run 1:N joins, generate the data on-the-fly, and immediately discard the result after the join. We run the benchmark on 10 n1-highmem-8 Google Compute Engine instances. Each instance is equipped with 8 cores, 52GB RAM, 40GB of which are configured as working memory (5GB per core), and one local SSD for spilling to disk. All benchmarks are performed using the same configuration, i.e., no fine tuning for the respective data sizes is done. The programs are executed with a parallelism of 80. </p>
+
+<p><center>
+<img src="/img/blog/joins-dist-perf.png" style="width:70%;margin:15px">
+</center></p>
+
+<p>As expected, the Broadcast-Forward strategy performs best for very small inputs because the large probe side is not shipped over the network and is locally joined. However, when the size of the broadcasted side grows, two problems arise. First the amount of data which is shipped increases but also each parallel instance has to process the full broadcasted data set. The performance of both Repartitioning strategies behaves similar for growing input sizes which indicates that these strategies are mainly limited by the cost of the data transfer (at max 2TB are shipped over the network and joined). Although the Sort-Merge-Join strategy shows the worst performance all shown cases, it has a right to exist because it can nicely exploit sorted input data.</p>
+
+<h3 id="i’ve-got-sooo-much-data-to-join,-do-i-really-need-to-ship-it?">I’ve got sooo much data to join, do I really need to ship it?</h3>
+
+<p>We have seen that off-the-shelf distributed joins work really well in Flink. But what if your data is so huge that you do not want to shuffle it across your cluster? We recently added some features to Flink for specifying semantic properties (partitioning and sorting) on input splits and co-located reading of local input files. With these tools at hand, it is possible to join pre-partitioned data sets from your local filesystem without sending a single byte over your cluster’s network. If the input data is even pre-sorted, the join can be done as a Sort-Merge-Join without sorting, i.e., the join is essentially done on-the-fly. Exploiting co-location requires a very special setup though. Data needs to be stored on the local filesystem because HDFS does not feature data co-location and might move file blocks across data nodes. That means you need to take care of many things yourself which HDFS would have done for you, including replication to avoid data loss. On the other hand
 , performance gains of joining co-located and pre-sorted can be quite substantial.</p>
+
+<h3 id="tl;dr:-what-should-i-remember-from-all-of-this?">tl;dr: What should I remember from all of this?</h3>
+
+<ul>
+<li>Flink’s fluent Scala and Java APIs make joins and other data transformations easy as cake.</li>
+<li>The optimizer does the hard choices for you, but gives you control in case you know better.</li>
+<li>Flink’s join implementations perform very good in-memory and gracefully degrade when going to disk. </li>
+<li>Due to Flink’s robust memory management, there is no need for job- or data-specific memory tuning to avoid a nasty <code>OutOfMemoryException</code>. It just runs out-of-the-box.</li>
+</ul>
+
+<h4 id="references">References</h4>
+
+<p>[1] <a href="">“MapReduce: Simplified data processing on large clusters”</a>, Dean, Ghemawat, 2004 <br>
+[2] <a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html">Flink 0.8.1 documentation: Data Transformations</a> <br>
+[3] <a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html#join">Flink 0.8.1 documentation: Joins</a> <br>
+[4] <a href="http://ci.apache.org/projects/flink/flink-docs-master/programming_guide.html#semantic-annotations">Flink 0.9-SNAPSHOT documentation: Semantic annotations</a> <br>
+[5] <a href="http://ci.apache.org/projects/flink/flink-docs-master/dataset_transformations.html#join-algorithm-hints">Flink 0.9-SNAPSHOT documentation: Optimizer join hints</a> <br></p>
+
+<p><br>
+<small>Written by Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>).</small></p>
+
+				</div>
+			</article>
+		</div>
+		<div class="col-md-2"></div>
+	</div>
+	<div class="row" style="padding-top:30px">
+		<div class="col-md-2"></div>
+		<div class="col-md-8">
+		    <div id="disqus_thread"></div>
+		    <script type="text/javascript">
+		        /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
+		        var disqus_shortname = 'stratosphere-eu'; // required: replace example with your forum shortname
+
+		        /* * * DON'T EDIT BELOW THIS LINE * * */
+		        (function() {
+		            var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+		            dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
+		            (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+		        })();
+		    </script>
+		    <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
+		    <a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>			    
+		</div>
+		<div class="col-md-2"></div>
+	</div>
+</div>
+
+    </div>
+    <!--<section id="af-upfooter" class="af-section">
+	<div class="container">
+		<p>Apache Flink is an effort undergoing incubation at The Apache
+			Software Foundation (ASF), sponsored by the Apache Incubator PMC.
+			Incubation is required of all newly accepted projects until a further
+			review indicates that the infrastructure, communications, and
+			decision making process have stabilized in a manner consistent with
+			other successful ASF projects. While incubation status is not
+			necessarily a reflection of the completeness or stability of the
+			code, it does indicate that the project has yet to be fully endorsed
+			by the ASF.</p>
+		<a href="http://incubator.apache.org"> <img class="img-responsive"
+			src="/img/main/apache-incubator-logo.png" alt="Apache Flink" />
+		</a>
+		<p class="text-center">
+			<a href="/privacy-policy.html" title="Privacy Policy"
+				class="af-privacy-policy">Privacy Policy</a>
+		</p>
+	</div>
+</section>-->
+
+<footer id="af-footer">
+	<div class="container">
+		<div class="row">
+			<div class="col-md-3">
+				<h3>Documentation</h3>
+				<ul class="af-footer-menu">
+
+					<li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8//">0.8.1</a></li>
+					<li><a href="http://ci.apache.org/projects/flink/flink-docs-release-0.8//api/java/">0.8.1 Javadocs</a></li>
+					<li><a
+						href="http://ci.apache.org/projects/flink/flink-docs-release-0.8//api/scala/index.html#org.apache.flink.api.scala.package">0.8.1 Scaladocs</a></li>
+				</ul>
+			</div>
+			<div class="col-md-3">
+				<h3>Community</h3>
+				<ul class="af-footer-menu">
+					<li><a href="/community.html#mailing-lists">Mailing Lists</a></li>
+					<li><a href="https://issues.apache.org/jira/browse/FLINK"
+						target="blank">Issues <span
+							class="glyphicon glyphicon-new-window"></span></a></li>
+					<li><a href="/community.html#team">Team</a></li>
+					<li><a href="/how-to-contribute.html">How to contribute</a></li>
+					<li><a href="/coding_guidelines.html">Coding Guidelines</a></li>
+				</ul>
+			</div>
+			<div class="col-md-3">
+				<h3>ASF</h3>
+				<ul class="af-footer-menu">
+					<li><a href="http://www.apache.org/" target="blank">Apache
+							Software foundation <span class="glyphicon glyphicon-new-window"></span>
+					</a></li>
+					<li><a
+						href="http://www.apache.org/foundation/how-it-works.html"
+						target="blank">How it works <span
+							class="glyphicon glyphicon-new-window"></span></a></li>
+					<li><a href="http://www.apache.org/foundation/thanks.html"
+						target="blank">Thanks <span
+							class="glyphicon glyphicon-new-window"></span></a></li>
+					<li><a
+						href="http://www.apache.org/foundation/sponsorship.html"
+						target="blank">Become a sponsor <span
+							class="glyphicon glyphicon-new-window"></span></a></li>
+					<li><a href="http://incubator.apache.org/projects/flink.html"
+						target="blank">Incubation status page <span
+							class="glyphicon glyphicon-new-window"></span></a></li>
+				</ul>
+			</div>
+			<div class="col-md-3">
+				<h3>Project</h3>
+				<ul class="af-footer-menu">
+					<li><a href="/material.html" target="blank">Material <span
+							class="glyphicon glyphicon-new-window"></span></a></li>
+					<li><a
+						href="https://cwiki.apache.org/confluence/display/FLINK"
+						target="blank">Wiki <span
+							class="glyphicon glyphicon-new-window"></span></a></li>
+					<li><a
+						href="https://wiki.apache.org/incubator/StratosphereProposal"
+						target="blank">Incubator proposal <span
+							class="glyphicon glyphicon-new-window"></span></a></li>
+					<li><a href="http://www.apache.org/licenses/LICENSE-2.0"
+						target="blank">License <span
+							class="glyphicon glyphicon-new-window"></span></a></li>
+					<li><a href="https://github.com/apache/incubator-flink"
+						target="blank">Source code <span
+							class="glyphicon glyphicon-new-window"></span></a></li>
+				</ul>
+			</div>
+		</div>
+	</div>
+	<div class="af-footer-bar">
+		<div class="container">
+		  <p>Copyright &copy 2014-2015, <a href="http://www.apache.org">The Apache Software Foundation</a>. All Rights Reserved. Apache and the Apache feather logo are trademarks of the Apache Software Foundation.
+                  </p>
+                  <div>
+                    <div style="float:left">
+                      <p>
+                        <a href="/privacy-policy.html" title="Privacy Policy" class="af-privacy-policy">Privacy Policy</a>
+                    </p>
+                    </div>
+                    <div style="float:right">
+                    <p>
+                      <a href="/blog/feed.xml" class="af-privacy-policy">RSS Feed</a>
+                    </p>
+                    </div>
+                   </div>
+    		</div>
+	</div>
+</footer>
+
+    <!-- Google Analytics -->
+    <script>
+      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+      ga('create', 'UA-52545728-1', 'auto');
+      ga('send', 'pageview');
+    </script>
+    <script src="/js/main/jquery.mobile.events.min.js"></script>
+    <script src="/js/main/main.js"></script>
+  </body>
+</html>



Mime
View raw message