flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: svn commit: r1650029 - in /flink: _posts/2015-01-06-december-in-flink.md site/blog/index.html site/blog/page2/index.html site/blog/page3/index.html site/news/2015/ site/news/2015/01/ site/news/2015/01/06/ site/news/2015/01/06/december-in-flink.html
Date Wed, 07 Jan 2015 10:43:46 GMT
Just FYI, the svnpubsub for the website is currently not working.
This is the respective issue for the website migration:
https://issues.apache.org/jira/browse/INFRA-8915

On Wed, Jan 7, 2015 at 11:40 AM, <ktzoumas@apache.org> wrote:

> Author: ktzoumas
> Date: Wed Jan  7 10:40:31 2015
> New Revision: 1650029
>
> URL: http://svn.apache.org/r1650029
> Log:
> Added blog post - December 2014 in the Flink community
>
> Added:
>     flink/_posts/2015-01-06-december-in-flink.md
>     flink/site/news/2015/
>     flink/site/news/2015/01/
>     flink/site/news/2015/01/06/
>     flink/site/news/2015/01/06/december-in-flink.html
> Modified:
>     flink/site/blog/index.html
>     flink/site/blog/page2/index.html
>     flink/site/blog/page3/index.html
>
> Added: flink/_posts/2015-01-06-december-in-flink.md
> URL:
> http://svn.apache.org/viewvc/flink/_posts/2015-01-06-december-in-flink.md?rev=1650029&view=auto
>
> ==============================================================================
> --- flink/_posts/2015-01-06-december-in-flink.md (added)
> +++ flink/_posts/2015-01-06-december-in-flink.md Wed Jan  7 10:40:31 2015
> @@ -0,0 +1,62 @@
> +---
> +layout: post
> +title:  'December 2014 in the Flink community'
> +date:   2015-01-06 10:00:00
> +categories: news
> +---
> +
> +This is the first blog post of a “newsletter† like series where we
> give a summary of the monthly activity in the Flink community. As the Flink
> project grows, this can serve as a "tl;dr" for people that are not
> following the Flink dev and user mailing lists, or those that are simply
> overwhelmed by the traffic.
> +
> +
> +###Flink graduation
> +
> +The biggest news is that the Apache board approved Flink as a top-level
> Apache project! The Flink team is working closely with the Apache press
> team for an official announcement, so stay tuned for details!
> +
> +###New Flink website
> +
> +The [Flink website](http://flink.apache.org) got a total make-over, both
> in terms of appearance and content.
> +
> +###Flink IRC channel
> +
> +A new IRC channel called #flink was created at irc.freenode.org. An easy
> way to access the IRC channel is through the [web client](
> http://webchat.freenode.net/).  Feel free to stop by to ask anything or
> share your ideas about Apache Flink!
> +
> +###Meetups and Talks
> +
> +Apache Flink was presented in the [Amsterdam Hadoop User Group](
> http://www.meetup.com/Netherlands-Hadoop-User-Group/events/218635152)
> +
> +##Notable code contributions
> +
> +**Note:** Code contributions listed here may not be part of a release or
> even the current snapshot yet.
> +
> +###[Streaming Scala API](
> https://github.com/apache/incubator-flink/pull/275)
> +
> +The Flink Streaming Java API recently got its Scala counterpart. Once
> merged, Flink Streaming users can use both Scala and Java for their
> development. The Flink Streaming Scala API is built as a thin layer on top
> of the Java API, making sure that the APIs are kept easily in sync.
> +
> +###[Intermediate datasets](
> https://github.com/apache/incubator-flink/pull/254)
> +
> +This pull request introduces a major change in the Flink runtime.
> Currently, the Flink runtime is based on the notion of operators that
> exchange data through channels. With the PR, intermediate data sets that
> are produced by operators become first-class citizens in the runtime. While
> this does not have any user-facing impact yet, it lays the groundwork for a
> slew of future features such as blocking execution, fine-grained
> fault-tolerance, and more efficient data sharing between cluster and client.
> +
> +###[Configurable execution mode](
> https://github.com/apache/incubator-flink/pull/259)
> +
> +This pull request allows the user to change the object-reuse behaviour.
> Before this pull request, some operations would reuse objects passed to the
> user function while others would always create new objects. This introduces
> a system wide switch and changes all operators to either reuse objects or
> don’t reuse objects.
> +
> +###[Distributed Coordination via Akka](
> https://github.com/apache/incubator-flink/pull/149)
> +
> +Another major change is a complete rewrite of the JobManager /
> TaskManager components in Scala. In addition to that, the old RPC service
> was replaced by Actors, using the Akka framework.
> +
> +###[Sorting of very large records](
> https://github.com/apache/incubator-flink/pull/249 )
> +
> +Flink's internal sort-algorithms were improved to better handle large
> records (multiple 100s of megabytes or larger). Previously, the system did
> in some cases hold instances of multiple large records, resulting in high
> memory consumption and JVM heap thrashing. Through this fix, large records
> are streamed through the operators, reducing the memory consumption and GC
> pressure. The system now requires much less memory to support algorithms
> that work on such large records.
> +
> +###[Kryo Serialization as the new default fallback](
> https://github.com/apache/incubator-flink/pull/271)
> +
> +Flink’s build-in type serialization framework is handles all common
> types very efficiently. Prior versions uses Avro to serialize types that
> the built-in framework could not handle.
> +Flink serialization system improved a lot over time and by now surpasses
> the capabilities of Avro in many cases. Kryo now serves as the default
> fallback serialization framework, supporting a much broader range of types.
> +
> +###[Hadoop FileSystem support](
> https://github.com/apache/incubator-flink/pull/268)
> +
> +This change permits users to use all file systems supported by Hadoop
> with Flink. In practice this means that users can use Flink with Tachyon,
> Google Cloud Storage (also out of the box Flink YARN support on Google
> Compute Cloud), FTP and all the other file system implementations for
> Hadoop.
> +
> +##Heading to the 0.8.0 release
> +
> +The community is working hard together with the Apache infra team to
> migrate the Flink infrastructure to a top-level project. At the same time,
> the Flink community is working on the Flink 0.8.0 release which should be
> out very soon.
> \ No newline at end of file
>
> Modified: flink/site/blog/index.html
> URL:
> http://svn.apache.org/viewvc/flink/site/blog/index.html?rev=1650029&r1=1650028&r2=1650029&view=diff
>
> ==============================================================================
> --- flink/site/blog/index.html (original)
> +++ flink/site/blog/index.html Wed Jan  7 10:40:31 2015
> @@ -131,6 +131,68 @@
>                 <div class="col-md-8">
>
>                         <article>
> +                               <h2><a
> href="/news/2015/01/06/december-in-flink.html">December 2014 in the Flink
> community</a></h2>
> +                               <p class="meta">06 Jan 2015</p>
> +
> +                               <div><p>This is the first blog post of a
> “newsletter† like series where we give a summary of the monthly
> activity in the Flink community. As the Flink project grows, this can serve
> as a &quot;tl;dr&quot; for people that are not following the Flink dev and
> user mailing lists, or those that are simply overwhelmed by the traffic.</p>
> +
> +<h3 id="flink-graduation">Flink graduation</h3>
> +
> +<p>The biggest news is that the Apache board approved Flink as a
> top-level Apache project! The Flink team is working closely with the Apache
> press team for an official announcement, so stay tuned for details!</p>
> +
> +<h3 id="new-flink-website">New Flink website</h3>
> +
> +<p>The <a href="http://flink.apache.org">Flink website</a> got a total
> make-over, both in terms of appearance and content.</p>
> +
> +<h3 id="flink-irc-channel">Flink IRC channel</h3>
> +
> +<p>A new IRC channel called #flink was created at irc.freenode.org. An
> easy way to access the IRC channel is through the <a href="
> http://webchat.freenode.net/">web client</a>.  Feel free to stop by to
> ask anything or share your ideas about Apache Flink!</p>
> +
> +<h3 id="meetups-and-talks">Meetups and Talks</h3>
> +
> +<p>Apache Flink was presented in the <a href="
> http://www.meetup.com/Netherlands-Hadoop-User-Group/events/218635152">Amsterdam
> Hadoop User Group</a></p>
> +
> +<h2 id="notable-code-contributions">Notable code contributions</h2>
> +
> +<p><strong>Note:</strong> Code contributions listed here may not be part
> of a release or even the current snapshot yet.</p>
> +
> +<h3 id="streaming-scala-api"><a href="
> https://github.com/apache/incubator-flink/pull/275">Streaming Scala
> API</a></h3>
> +
> +<p>The Flink Streaming Java API recently got its Scala counterpart. Once
> merged, Flink Streaming users can use both Scala and Java for their
> development. The Flink Streaming Scala API is built as a thin layer on top
> of the Java API, making sure that the APIs are kept easily in sync.</p>
> +
> +<h3 id="intermediate-datasets"><a href="
> https://github.com/apache/incubator-flink/pull/254">Intermediate
> datasets</a></h3>
> +
> +<p>This pull request introduces a major change in the Flink runtime.
> Currently, the Flink runtime is based on the notion of operators that
> exchange data through channels. With the PR, intermediate data sets that
> are produced by operators become first-class citizens in the runtime. While
> this does not have any user-facing impact yet, it lays the groundwork for a
> slew of future features such as blocking execution, fine-grained
> fault-tolerance, and more efficient data sharing between cluster and
> client.</p>
> +
> +<h3 id="configurable-execution-mode"><a href="
> https://github.com/apache/incubator-flink/pull/259">Configurable
> execution mode</a></h3>
> +
> +<p>This pull request allows the user to change the object-reuse
> behaviour. Before this pull request, some operations would reuse objects
> passed to the user function while others would always create new objects.
> This introduces a system wide switch and changes all operators to either
> reuse objects or don’t reuse objects.</p>
> +
> +<h3 id="distributed-coordination-via-akka"><a href="
> https://github.com/apache/incubator-flink/pull/149">Distributed
> Coordination via Akka</a></h3>
> +
> +<p>Another major change is a complete rewrite of the JobManager /
> TaskManager components in Scala. In addition to that, the old RPC service
> was replaced by Actors, using the Akka framework.</p>
> +
> +<h3 id="sorting-of-very-large-records"><a href="
> https://github.com/apache/incubator-flink/pull/249">Sorting of very large
> records</a></h3>
> +
> +<p>Flink&#39;s internal sort-algorithms were improved to better handle
> large records (multiple 100s of megabytes or larger). Previously, the
> system did in some cases hold instances of multiple large records,
> resulting in high memory consumption and JVM heap thrashing. Through this
> fix, large records are streamed through the operators, reducing the memory
> consumption and GC pressure. The system now requires much less memory to
> support algorithms that work on such large records.</p>
> +
> +<h3 id="kryo-serialization-as-the-new-default-fallback"><a href="
> https://github.com/apache/incubator-flink/pull/271">Kryo Serialization as
> the new default fallback</a></h3>
> +
> +<p>Flink’s build-in type serialization framework is handles all common
> types very efficiently. Prior versions uses Avro to serialize types that
> the built-in framework could not handle.
> +Flink serialization system improved a lot over time and by now surpasses
> the capabilities of Avro in many cases. Kryo now serves as the default
> fallback serialization framework, supporting a much broader range of
> types.</p>
> +
> +<h3 id="hadoop-filesystem-support"><a href="
> https://github.com/apache/incubator-flink/pull/268">Hadoop FileSystem
> support</a></h3>
> +
> +<p>This change permits users to use all file systems supported by Hadoop
> with Flink. In practice this means that users can use Flink with Tachyon,
> Google Cloud Storage (also out of the box Flink YARN support on Google
> Compute Cloud), FTP and all the other file system implementations for
> Hadoop.</p>
> +
> +<h2 id="heading-to-the-0.8.0-release">Heading to the 0.8.0 release</h2>
> +
> +<p>The community is working hard together with the Apache infra team to
> migrate the Flink infrastructure to a top-level project. At the same time,
> the Flink community is working on the Flink 0.8.0 release which should be
> out very soon.</p>
> +</div>
> +                               <a
> href="/news/2015/01/06/december-in-flink.html#disqus_thread">December 2014
> in the Flink community</a>
> +                       </article>
> +
> +                       <article>
>                                 <h2><a
> href="/news/2014/11/18/hadoop-compatibility.html">Hadoop Compatibility in
> Flink</a></h2>
>                                 <p class="meta">18 Nov 2014</p>
>
> @@ -786,98 +848,6 @@ Inspect the result in HDFS using:</p>
>                                 <a
> href="/news/2014/02/18/amazon-elastic-mapreduce-cloud-yarn.html#disqus_thread">Use
> Stratosphere with Amazon Elastic MapReduce</a>
>                         </article>
>
> -                       <article>
> -                               <h2><a
> href="/news/2014/01/28/querying_mongodb.html">Accessing Data Stored in
> MongoDB with Stratosphere</a></h2>
> -                               <p class="meta">28 Jan 2014</p>
> -
> -                               <div><p>We recently merged a <a href="
> https://github.com/stratosphere/stratosphere/pull/437">pull request</a>
> that allows you to use any existing Hadoop <a href="
> http://developer.yahoo.com/hadoop/tutorial/module5.html#inputformat">InputFormat</a>
> with Stratosphere. So you can now (in the <code>0.5-SNAPSHOT</code> and
> upwards versions) define a Hadoop-based data source:</p>
> -<div class="highlight"><pre><code class="language-java"
> data-lang="java"><span class="n">HadoopDataSource</span> <span
> class="n">source</span> <span class="o">=</span> <span class="k">new</span>
> <span class="nf">HadoopDataSource</span><span class="o">(</span><span
> class="k">new</span> <span class="nf">TextInputFormat</span><span
> class="o">(),</span> <span class="k">new</span> <span
> class="nf">JobConf</span><span class="o">(),</span> <span
> class="s">&quot;Input Lines&quot;</span><span class="o">);</span>
> -<span class="n">TextInputFormat</span><span class="o">.</span><span
> class="na">addInputPath</span><span class="o">(</span><span
> class="n">source</span><span class="o">.</span><span
> class="na">getJobConf</span><span class="o">(),</span> <span
> class="k">new</span> <span class="nf">Path</span><span
> class="o">(</span><span class="n">dataInput</span><span class="o">));</span>
> -</code></pre></div>
> -<p>We describe in the following article how to access data stored in <a
> href="http://www.mongodb.org/">MongoDB</a> with Stratosphere. This allows
> users to join data from multiple sources (e.g. MonogDB and HDFS) or perform
> machine learning with the documents stored in MongoDB.</p>
> -
> -<p>The approach here is to use the <code>MongoInputFormat</code> that was
> developed for Apache Hadoop but now also runs with Stratosphere.</p>
> -<div class="highlight"><pre><code class="language-java"
> data-lang="java"><span class="n">JobConf</span> <span class="n">conf</span>
> <span class="o">=</span> <span class="k">new</span> <span
> class="nf">JobConf</span><span class="o">();</span>
> -<span class="n">conf</span><span class="o">.</span><span
> class="na">set</span><span class="o">(</span><span
> class="s">&quot;mongo.input.uri&quot;</span><span class="o">,</span><span
> class="s">&quot;mongodb://localhost:27017/enron_mail.messages&quot;</span><span
> class="o">);</span>
> -<span class="n">HadoopDataSource</span> <span class="n">src</span> <span
> class="o">=</span> <span class="k">new</span> <span
> class="nf">HadoopDataSource</span><span class="o">(</span><span
> class="k">new</span> <span class="nf">MongoInputFormat</span><span
> class="o">(),</span> <span class="n">conf</span><span class="o">,</span>
> <span class="s">&quot;Read from Mongodb&quot;</span><span
> class="o">,</span> <span class="k">new</span> <span
> class="nf">WritableWrapperConverter</span><span class="o">());</span>
> -</code></pre></div>
> -<h3 id="example-program">Example Program</h3>
> -
> -<p>The example program reads data from the <a href="
> http://www.cs.cmu.edu/%7Eenron/">enron dataset</a> that contains about
> 500k internal e-mails. The data is stored in MongoDB and the Stratosphere
> program counts the number of e-mails per day.</p>
> -
> -<p>The complete code of this sample program is available on <a href="
> https://github.com/stratosphere/stratosphere-mongodb-example
> ">GitHub</a>.</p>
> -
> -<h4 id="prepare-mongodb-and-the-data">Prepare MongoDB and the Data</h4>
> -
> -<ul>
> -<li>Install MongoDB</li>
> -<li>Download the enron dataset from <a href="
> http://mongodb-enron-email.s3-website-us-east-1.amazonaws.com/">their
> website</a>.</li>
> -<li>Unpack and load it</li>
> -</ul>
> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">
> bunzip2 enron_mongo.tar.bz2
> - tar xvf enron_mongo.tar
> - mongorestore dump/enron_mail/messages.bson
> -</code></pre></div>
> -<p>We used <a href="http://robomongo.org/">Robomongo</a> to visually
> examine the dataset stored in MongoDB.</p>
> -
> -<p><img src="/img/blog/robomongo.png" style="width:90%;margin:15px"></p>
> -
> -<h4 id="build-mongoinputformat">Build <code>MongoInputFormat</code></h4>
> -
> -<p>MongoDB offers an InputFormat for Hadoop on their <a href="
> https://github.com/mongodb/mongo-hadoop">GitHub page</a>. The code is not
> available in any Maven repository, so we have to build the jar file on our
> own.</p>
> -
> -<ul>
> -<li>Check out the repository</li>
> -</ul>
> -<div class="highlight"><pre><code class="language-text"
> data-lang="text">git clone https://github.com/mongodb/mongo-hadoop.git
> -cd mongo-hadoop
> -</code></pre></div>
> -<ul>
> -<li>Set the appropriate Hadoop version in the <code>build.sbt</code>, we
> used <code>1.1</code>.</li>
> -</ul>
> -<div class="highlight"><pre><code class="language-bash"
> data-lang="bash">hadoopRelease in ThisBuild :<span class="o">=</span> <span
> class="s2">&quot;1.1&quot;</span>
> -</code></pre></div>
> -<ul>
> -<li>Build the input format</li>
> -</ul>
> -<div class="highlight"><pre><code class="language-bash"
> data-lang="bash">./sbt package
> -</code></pre></div>
> -<p>The jar-file is now located in <code>core/target</code>.</p>
> -
> -<h4 id="the-stratosphere-program">The Stratosphere Program</h4>
> -
> -<p>Now we have everything prepared to run the Stratosphere program. I
> only ran it on my local computer, out of Eclipse. To do that, check out the
> code ...</p>
> -<div class="highlight"><pre><code class="language-bash"
> data-lang="bash">git clone
> https://github.com/stratosphere/stratosphere-mongodb-example.git
> -</code></pre></div>
> -<p>... and import it as a Maven project into your Eclipse. You have to
> manually add the previously built mongo-hadoop jar-file as a dependency.
> -You can now press the &quot;Run&quot; button and see how Stratosphere
> executes the little program. It was running for about 8 seconds on the 1.5
> GB dataset.</p>
> -
> -<p>The result (located in <code>/tmp/enronCountByDay</code>) now looks
> like this.</p>
> -<div class="highlight"><pre><code class="language-text"
> data-lang="text">11,Fri Sep 26 10:00:00 CEST 1997
> -154,Tue Jun 29 10:56:00 CEST 1999
> -292,Tue Aug 10 12:11:00 CEST 1999
> -185,Thu Aug 12 18:35:00 CEST 1999
> -26,Fri Mar 19 12:33:00 CET 1999
> -</code></pre></div>
> -<p>There is one thing left I want to point out here. MongoDB represents
> objects stored in the database as JSON-documents. Since Stratosphere&#39;s
> standard types do not support JSON documents, I was using the
> <code>WritableWrapper</code> here. This wrapper allows to use any Hadoop
> datatype with Stratosphere.</p>
> -
> -<p>The following code example shows how the JSON-documents are accessed
> in Stratosphere.</p>
> -<div class="highlight"><pre><code class="language-java"
> data-lang="java"><span class="kd">public</span> <span
> class="kt">void</span> <span class="nf">map</span><span
> class="o">(</span><span class="n">Record</span> <span
> class="n">record</span><span class="o">,</span> <span
> class="n">Collector</span><span class="o">&lt;</span><span
> class="n">Record</span><span class="o">&gt;</span> <span
> class="n">out</span><span class="o">)</span> <span class="kd">throws</span>
> <span class="n">Exception</span> <span class="o">{</span>
> -    <span class="n">Writable</span> <span class="n">valWr</span> <span
> class="o">=</span> <span class="n">record</span><span
> class="o">.</span><span class="na">getField</span><span
> class="o">(</span><span class="mi">1</span><span class="o">,</span> <span
> class="n">WritableWrapper</span><span class="o">.</span><span
> class="na">class</span><span class="o">).</span><span
> class="na">value</span><span class="o">();</span>
> -    <span class="n">BSONWritable</span> <span class="n">value</span>
> <span class="o">=</span> <span class="o">(</span><span
> class="n">BSONWritable</span><span class="o">)</span> <span
> class="n">valWr</span><span class="o">;</span>
> -    <span class="n">Object</span> <span class="n">headers</span> <span
> class="o">=</span> <span class="n">value</span><span
> class="o">.</span><span class="na">getDoc</span><span
> class="o">().</span><span class="na">get</span><span
> class="o">(</span><span class="s">&quot;headers&quot;</span><span
> class="o">);</span>
> -    <span class="n">BasicDBObject</span> <span class="n">headerOb</span>
> <span class="o">=</span> <span class="o">(</span><span
> class="n">BasicDBObject</span><span class="o">)</span> <span
> class="n">headers</span><span class="o">;</span>
> -    <span class="n">String</span> <span class="n">date</span> <span
> class="o">=</span> <span class="o">(</span><span
> class="n">String</span><span class="o">)</span> <span
> class="n">headerOb</span><span class="o">.</span><span
> class="na">get</span><span class="o">(</span><span
> class="s">&quot;Date&quot;</span><span class="o">);</span>
> -    <span class="c1">// further date processing</span>
> -<span class="o">}</span>
> -</code></pre></div>
> -<p>Please use the comments if you have questions or if you want to
> showcase your own MongoDB-Stratosphere integration.
> -<br><br>
> -<small>Written by Robert Metzger (<a href="https://twitter.com/rmetzger_
> ">@rmetzger_</a>).</small></p>
> -</div>
> -                               <a
> href="/news/2014/01/28/querying_mongodb.html#disqus_thread">Accessing Data
> Stored in MongoDB with Stratosphere</a>
> -                       </article>
> -
>                 </div>
>                 <div class="col-md-2"></div>
>         </div>
>
> Modified: flink/site/blog/page2/index.html
> URL:
> http://svn.apache.org/viewvc/flink/site/blog/page2/index.html?rev=1650029&r1=1650028&r2=1650029&view=diff
>
> ==============================================================================
> --- flink/site/blog/page2/index.html (original)
> +++ flink/site/blog/page2/index.html Wed Jan  7 10:40:31 2015
> @@ -131,6 +131,98 @@
>                 <div class="col-md-8">
>
>                         <article>
> +                               <h2><a
> href="/news/2014/01/28/querying_mongodb.html">Accessing Data Stored in
> MongoDB with Stratosphere</a></h2>
> +                               <p class="meta">28 Jan 2014</p>
> +
> +                               <div><p>We recently merged a <a href="
> https://github.com/stratosphere/stratosphere/pull/437">pull request</a>
> that allows you to use any existing Hadoop <a href="
> http://developer.yahoo.com/hadoop/tutorial/module5.html#inputformat">InputFormat</a>
> with Stratosphere. So you can now (in the <code>0.5-SNAPSHOT</code> and
> upwards versions) define a Hadoop-based data source:</p>
> +<div class="highlight"><pre><code class="language-java"
> data-lang="java"><span class="n">HadoopDataSource</span> <span
> class="n">source</span> <span class="o">=</span> <span class="k">new</span>
> <span class="nf">HadoopDataSource</span><span class="o">(</span><span
> class="k">new</span> <span class="nf">TextInputFormat</span><span
> class="o">(),</span> <span class="k">new</span> <span
> class="nf">JobConf</span><span class="o">(),</span> <span
> class="s">&quot;Input Lines&quot;</span><span class="o">);</span>
> +<span class="n">TextInputFormat</span><span class="o">.</span><span
> class="na">addInputPath</span><span class="o">(</span><span
> class="n">source</span><span class="o">.</span><span
> class="na">getJobConf</span><span class="o">(),</span> <span
> class="k">new</span> <span class="nf">Path</span><span
> class="o">(</span><span class="n">dataInput</span><span class="o">));</span>
> +</code></pre></div>
> +<p>We describe in the following article how to access data stored in <a
> href="http://www.mongodb.org/">MongoDB</a> with Stratosphere. This allows
> users to join data from multiple sources (e.g. MonogDB and HDFS) or perform
> machine learning with the documents stored in MongoDB.</p>
> +
> +<p>The approach here is to use the <code>MongoInputFormat</code> that was
> developed for Apache Hadoop but now also runs with Stratosphere.</p>
> +<div class="highlight"><pre><code class="language-java"
> data-lang="java"><span class="n">JobConf</span> <span class="n">conf</span>
> <span class="o">=</span> <span class="k">new</span> <span
> class="nf">JobConf</span><span class="o">();</span>
> +<span class="n">conf</span><span class="o">.</span><span
> class="na">set</span><span class="o">(</span><span
> class="s">&quot;mongo.input.uri&quot;</span><span class="o">,</span><span
> class="s">&quot;mongodb://localhost:27017/enron_mail.messages&quot;</span><span
> class="o">);</span>
> +<span class="n">HadoopDataSource</span> <span class="n">src</span> <span
> class="o">=</span> <span class="k">new</span> <span
> class="nf">HadoopDataSource</span><span class="o">(</span><span
> class="k">new</span> <span class="nf">MongoInputFormat</span><span
> class="o">(),</span> <span class="n">conf</span><span class="o">,</span>
> <span class="s">&quot;Read from Mongodb&quot;</span><span
> class="o">,</span> <span class="k">new</span> <span
> class="nf">WritableWrapperConverter</span><span class="o">());</span>
> +</code></pre></div>
> +<h3 id="example-program">Example Program</h3>
> +
> +<p>The example program reads data from the <a href="
> http://www.cs.cmu.edu/%7Eenron/">enron dataset</a> that contains about
> 500k internal e-mails. The data is stored in MongoDB and the Stratosphere
> program counts the number of e-mails per day.</p>
> +
> +<p>The complete code of this sample program is available on <a href="
> https://github.com/stratosphere/stratosphere-mongodb-example
> ">GitHub</a>.</p>
> +
> +<h4 id="prepare-mongodb-and-the-data">Prepare MongoDB and the Data</h4>
> +
> +<ul>
> +<li>Install MongoDB</li>
> +<li>Download the enron dataset from <a href="
> http://mongodb-enron-email.s3-website-us-east-1.amazonaws.com/">their
> website</a>.</li>
> +<li>Unpack and load it</li>
> +</ul>
> +<div class="highlight"><pre><code class="language-bash" data-lang="bash">
> bunzip2 enron_mongo.tar.bz2
> + tar xvf enron_mongo.tar
> + mongorestore dump/enron_mail/messages.bson
> +</code></pre></div>
> +<p>We used <a href="http://robomongo.org/">Robomongo</a> to visually
> examine the dataset stored in MongoDB.</p>
> +
> +<p><img src="/img/blog/robomongo.png" style="width:90%;margin:15px"></p>
> +
> +<h4 id="build-mongoinputformat">Build <code>MongoInputFormat</code></h4>
> +
> +<p>MongoDB offers an InputFormat for Hadoop on their <a href="
> https://github.com/mongodb/mongo-hadoop">GitHub page</a>. The code is not
> available in any Maven repository, so we have to build the jar file on our
> own.</p>
> +
> +<ul>
> +<li>Check out the repository</li>
> +</ul>
> +<div class="highlight"><pre><code class="language-text"
> data-lang="text">git clone https://github.com/mongodb/mongo-hadoop.git
> +cd mongo-hadoop
> +</code></pre></div>
> +<ul>
> +<li>Set the appropriate Hadoop version in the <code>build.sbt</code>, we
> used <code>1.1</code>.</li>
> +</ul>
> +<div class="highlight"><pre><code class="language-bash"
> data-lang="bash">hadoopRelease in ThisBuild :<span class="o">=</span> <span
> class="s2">&quot;1.1&quot;</span>
> +</code></pre></div>
> +<ul>
> +<li>Build the input format</li>
> +</ul>
> +<div class="highlight"><pre><code class="language-bash"
> data-lang="bash">./sbt package
> +</code></pre></div>
> +<p>The jar-file is now located in <code>core/target</code>.</p>
> +
> +<h4 id="the-stratosphere-program">The Stratosphere Program</h4>
> +
> +<p>Now we have everything prepared to run the Stratosphere program. I
> only ran it on my local computer, out of Eclipse. To do that, check out the
> code ...</p>
> +<div class="highlight"><pre><code class="language-bash"
> data-lang="bash">git clone
> https://github.com/stratosphere/stratosphere-mongodb-example.git
> +</code></pre></div>
> +<p>... and import it as a Maven project into your Eclipse. You have to
> manually add the previously built mongo-hadoop jar-file as a dependency.
> +You can now press the &quot;Run&quot; button and see how Stratosphere
> executes the little program. It was running for about 8 seconds on the 1.5
> GB dataset.</p>
> +
> +<p>The result (located in <code>/tmp/enronCountByDay</code>) now looks
> like this.</p>
> +<div class="highlight"><pre><code class="language-text"
> data-lang="text">11,Fri Sep 26 10:00:00 CEST 1997
> +154,Tue Jun 29 10:56:00 CEST 1999
> +292,Tue Aug 10 12:11:00 CEST 1999
> +185,Thu Aug 12 18:35:00 CEST 1999
> +26,Fri Mar 19 12:33:00 CET 1999
> +</code></pre></div>
> +<p>There is one thing left I want to point out here. MongoDB represents
> objects stored in the database as JSON-documents. Since Stratosphere&#39;s
> standard types do not support JSON documents, I was using the
> <code>WritableWrapper</code> here. This wrapper allows to use any Hadoop
> datatype with Stratosphere.</p>
> +
> +<p>The following code example shows how the JSON-documents are accessed
> in Stratosphere.</p>
> +<div class="highlight"><pre><code class="language-java"
> data-lang="java"><span class="kd">public</span> <span
> class="kt">void</span> <span class="nf">map</span><span
> class="o">(</span><span class="n">Record</span> <span
> class="n">record</span><span class="o">,</span> <span
> class="n">Collector</span><span class="o">&lt;</span><span
> class="n">Record</span><span class="o">&gt;</span> <span
> class="n">out</span><span class="o">)</span> <span class="kd">throws</span>
> <span class="n">Exception</span> <span class="o">{</span>
> +    <span class="n">Writable</span> <span class="n">valWr</span> <span
> class="o">=</span> <span class="n">record</span><span
> class="o">.</span><span class="na">getField</span><span
> class="o">(</span><span class="mi">1</span><span class="o">,</span> <span
> class="n">WritableWrapper</span><span class="o">.</span><span
> class="na">class</span><span class="o">).</span><span
> class="na">value</span><span class="o">();</span>
> +    <span class="n">BSONWritable</span> <span class="n">value</span>
> <span class="o">=</span> <span class="o">(</span><span
> class="n">BSONWritable</span><span class="o">)</span> <span
> class="n">valWr</span><span class="o">;</span>
> +    <span class="n">Object</span> <span class="n">headers</span> <span
> class="o">=</span> <span class="n">value</span><span
> class="o">.</span><span class="na">getDoc</span><span
> class="o">().</span><span class="na">get</span><span
> class="o">(</span><span class="s">&quot;headers&quot;</span><span
> class="o">);</span>
> +    <span class="n">BasicDBObject</span> <span class="n">headerOb</span>
> <span class="o">=</span> <span class="o">(</span><span
> class="n">BasicDBObject</span><span class="o">)</span> <span
> class="n">headers</span><span class="o">;</span>
> +    <span class="n">String</span> <span class="n">date</span> <span
> class="o">=</span> <span class="o">(</span><span
> class="n">String</span><span class="o">)</span> <span
> class="n">headerOb</span><span class="o">.</span><span
> class="na">get</span><span class="o">(</span><span
> class="s">&quot;Date&quot;</span><span class="o">);</span>
> +    <span class="c1">// further date processing</span>
> +<span class="o">}</span>
> +</code></pre></div>
> +<p>Please use the comments if you have questions or if you want to
> showcase your own MongoDB-Stratosphere integration.
> +<br><br>
> +<small>Written by Robert Metzger (<a href="https://twitter.com/rmetzger_
> ">@rmetzger_</a>).</small></p>
> +</div>
> +                               <a
> href="/news/2014/01/28/querying_mongodb.html#disqus_thread">Accessing Data
> Stored in MongoDB with Stratosphere</a>
> +                       </article>
> +
> +                       <article>
>                                 <h2><a
> href="/news/2014/01/26/optimizer_plan_visualization_tool.html">Optimizer
> Plan Visualization Tool</a></h2>
>                                 <p class="meta">26 Jan 2014</p>
>
> @@ -448,24 +540,6 @@ Analyzing big data sets as they occur in
>                                 <a
> href="/news/2012/11/12/btw2013demo.html#disqus_thread">Stratosphere Demo
> Paper Accepted for BTW 2013</a>
>                         </article>
>
> -                       <article>
> -                               <h2><a
> href="/news/2012/10/15/icde2013.html">Stratosphere Demo Accepted for ICDE
> 2013</a></h2>
> -                               <p class="meta">15 Oct 2012</p>
> -
> -                               <div> <p>Our demo submission<br />
> -<strong><cite>"Peeking into the Optimization of Data Flow Programs with
> MapReduce-style UDFs"</cite></strong><br />
> -has been accepted for ICDE 2013 in Brisbane, Australia.<br />
> -The demo illustrates the contributions of our VLDB 2012 paper
> <cite>"Opening the Black Boxes in Data Flow Optimization"</cite> <a
> href="/assets/papers/optimizationOfDataFlowsWithUDFs_13.pdf">[PDF]</a> and
> <a
> href="/assets/papers/optimizationOfDataFlowsWithUDFs_poster_13.pdf">[Poster
> PDF]</a>.</p>
> -<p>Visit our poster, enjoy the demo, and talk to us if you are going to
> attend ICDE 2013.</p>
> -<p><strong>Abstract:</strong><br />
> -Data flows are a popular abstraction to define data-intensive processing
> tasks. In order to support a wide range of use cases, many data processing
> systems feature MapReduce-style user-defined functions (UDFs). In contrast
> to UDFs as known from relational DBMS, MapReduce-style UDFs have less
> strict templates. These templates do not alone provide all the information
> needed to decide whether they can be reordered with relational operators
> and other UDFs. However, it is well-known that reordering operators such as
> filters, joins, and aggregations can yield runtime improvements by orders
> of magnitude.<br />
> -We demonstrate an optimizer for data flows that is able to reorder
> operators with MapReduce-style UDFs written in an imperative language. Our
> approach leverages static code analysis to extract information from UDFs
> which is used to reason about the reorderbility of UDF operators. This
> information is sufficient to enumerate a large fraction of the search space
> covered by conventional RDBMS optimizers including filter and aggregation
> push-down, bushy join orders, and choice of physical execution strategies
> based on interesting properties.<br />
> -We demonstrate our optimizer and a job submission client that allows
> users to peek step-by-step into each phase of the optimization process: the
> static code analysis of UDFs, the enumeration of reordered candidate data
> flows, the generation of physical execution plans, and their parallel
> execution. For the demonstration, we provide a selection of relational and
> non-relational data flow programs which highlight the salient features of
> our approach.</p>
> -
> -</div>
> -                               <a
> href="/news/2012/10/15/icde2013.html#disqus_thread">Stratosphere Demo
> Accepted for ICDE 2013</a>
> -                       </article>
> -
>                 </div>
>                 <div class="col-md-2"></div>
>         </div>
>
> Modified: flink/site/blog/page3/index.html
> URL:
> http://svn.apache.org/viewvc/flink/site/blog/page3/index.html?rev=1650029&r1=1650028&r2=1650029&view=diff
>
> ==============================================================================
> --- flink/site/blog/page3/index.html (original)
> +++ flink/site/blog/page3/index.html Wed Jan  7 10:40:31 2015
> @@ -131,6 +131,24 @@
>                 <div class="col-md-8">
>
>                         <article>
> +                               <h2><a
> href="/news/2012/10/15/icde2013.html">Stratosphere Demo Accepted for ICDE
> 2013</a></h2>
> +                               <p class="meta">15 Oct 2012</p>
> +
> +                               <div> <p>Our demo submission<br />
> +<strong><cite>"Peeking into the Optimization of Data Flow Programs with
> MapReduce-style UDFs"</cite></strong><br />
> +has been accepted for ICDE 2013 in Brisbane, Australia.<br />
> +The demo illustrates the contributions of our VLDB 2012 paper
> <cite>"Opening the Black Boxes in Data Flow Optimization"</cite> <a
> href="/assets/papers/optimizationOfDataFlowsWithUDFs_13.pdf">[PDF]</a> and
> <a
> href="/assets/papers/optimizationOfDataFlowsWithUDFs_poster_13.pdf">[Poster
> PDF]</a>.</p>
> +<p>Visit our poster, enjoy the demo, and talk to us if you are going to
> attend ICDE 2013.</p>
> +<p><strong>Abstract:</strong><br />
> +Data flows are a popular abstraction to define data-intensive processing
> tasks. In order to support a wide range of use cases, many data processing
> systems feature MapReduce-style user-defined functions (UDFs). In contrast
> to UDFs as known from relational DBMS, MapReduce-style UDFs have less
> strict templates. These templates do not alone provide all the information
> needed to decide whether they can be reordered with relational operators
> and other UDFs. However, it is well-known that reordering operators such as
> filters, joins, and aggregations can yield runtime improvements by orders
> of magnitude.<br />
> +We demonstrate an optimizer for data flows that is able to reorder
> operators with MapReduce-style UDFs written in an imperative language. Our
> approach leverages static code analysis to extract information from UDFs
> which is used to reason about the reorderbility of UDF operators. This
> information is sufficient to enumerate a large fraction of the search space
> covered by conventional RDBMS optimizers including filter and aggregation
> push-down, bushy join orders, and choice of physical execution strategies
> based on interesting properties.<br />
> +We demonstrate our optimizer and a job submission client that allows
> users to peek step-by-step into each phase of the optimization process: the
> static code analysis of UDFs, the enumeration of reordered candidate data
> flows, the generation of physical execution plans, and their parallel
> execution. For the demonstration, we provide a selection of relational and
> non-relational data flow programs which highlight the salient features of
> our approach.</p>
> +
> +</div>
> +                               <a
> href="/news/2012/10/15/icde2013.html#disqus_thread">Stratosphere Demo
> Accepted for ICDE 2013</a>
> +                       </article>
> +
> +                       <article>
>                                 <h2><a
> href="/news/2012/08/21/release02.html">Version 0.2 Released</a></h2>
>                                 <p class="meta">21 Aug 2012</p>
>
>
> Added: flink/site/news/2015/01/06/december-in-flink.html
> URL:
> http://svn.apache.org/viewvc/flink/site/news/2015/01/06/december-in-flink.html?rev=1650029&view=auto
>
> ==============================================================================
> --- flink/site/news/2015/01/06/december-in-flink.html (added)
> +++ flink/site/news/2015/01/06/december-in-flink.html Wed Jan  7 10:40:31
> 2015
> @@ -0,0 +1,339 @@
> +<!DOCTYPE html>
> +<html lang="en">
> +    <head>
> +           <meta charset="utf-8">
> +           <meta http-equiv="X-UA-Compatible" content="IE=edge">
> +           <meta name="viewport" content="width=device-width,
> initial-scale=1">
> +
> +           <title>Apache Flink (incubating): December 2014 in the Flink
> community</title>
> +           <link rel="shortcut icon" href="favicon.ico"
> type="image/x-icon">
> +           <link rel="icon" href="favicon.ico" type="image/x-icon">
> +           <link rel="stylesheet" href="/css/bootstrap.css">
> +           <link rel="stylesheet" href="/css/bootstrap-lumen-custom.css">
> +           <link rel="stylesheet" href="/css/syntax.css">
> +           <link rel="stylesheet" href="/css/custom.css">
> +           <link href="/css/main/main.css" rel="stylesheet">
> +           <!-- <link href="//
> maxcdn.bootstrapcdn.com/font-awesome/4.1.0/css/font-awesome.min.css"
> rel="stylesheet"> -->
> +           <script src="
> https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js
> "></script>
> +           <script src="/js/bootstrap.min.js"></script>
> +    </head>
> +    <body>
> +    <div class="af-header-container af-inner-pages-navigation">
> +       <header>
> +               <div class="container">
> +                       <div class="row">
> +                               <div class="col-md-1 af-mobile-nav-bar">
> +                                       <a href="/" title="Home">
> +                                       <img class="hidden-xs hidden-sm
> img-responsive"
> +                                               src="/img/main/logo.png"
> alt="Apache Flink Logo">
> +                                       </a>
> +                                       <div class="row visible-xs">
> +                                               <div class="col-xs-3">
> +                                                   <a href="/"
> title="Home">
> +                                                       <img
> class="hidden-x hidden-sm img-responsive"
> +
>  src="/img/main/logo.png" alt="Apache Flink Logo">
> +                                                       </a>
> +                                               </div>
> +                                               <div
> class="col-xs-5"></div>
> +                                               <div class="col-xs-4">
> +                                                       <div
> class="af-mobile-btn">
> +                                                               <span
> class="glyphicon glyphicon-plus"></span>
> +                                                       </div>
> +                                               </div>
> +                                       </div>
> +                               </div>
> +                               <!-- Navigation -->
> +                               <div class="col-md-11">
> +                                       <nav class="af-main-nav"
> role="navigation">
> +                                               <ul>
> +                                                       <li><a href="#"
> class="af-nav-links">Quickstart
> +                                                                       <b
> class="caret"></b>
> +                                                       </a>
> +                                                               <ul
> class="af-dropdown-menu">
> +
>  <li><a href="/docs/0.7-incubating/setup_quickstart.html">Setup
> +
>              Flink</a></li>
> +
>  <li><a
> +
>      href="/docs/0.7-incubating/java_api_quickstart.html">Java
> +
>              API</a></li>
> +
>  <li><a
> +
>      href="/docs/0.7-incubating/scala_api_quickstart.html">Scala
> +
>              API</a></li>
> +                                                               </ul></li>
> +                                                       <li><a
> href="/downloads.html">Download</a></li>
> +                                                       <li><a
> href="/docs/0.7-incubating/faq.html">FAQ</a></li>
> +                                                       <li><a href="#"
> class="af-nav-links">Documentation <b
> +
>  class="caret"></b></a>
> +                                                               <ul
> class="af-dropdown-menu">
> +
>  <li class="af-separator">Current Stable:</li>
> +
>  <li></li>
> +
>  <li><a href="/docs/0.7-incubating/">0.7.0-incubating</a></li>
> +
>  <li><a href="/docs/0.7-incubating/api/java">0.7.0-incubating
> +
>              Javadocs</a></li>
> +
>  <li><a
> +
>
>  href="/docs/0.7-incubating/api/scala/index.html#org.apache.flink.api.scala.package">0.7.0-incubating
> +
>              Scaladocs</a></li>
> +
>  <li class="divider"></li>
> +
>  <li class="af-separator">Previous:</li>
> +
>  <li></li>
> +
>  <li><a href="/docs/0.6-incubating/">0.6-incubating</a></li>
> +
>  <li><a href="/docs/0.6-incubating/api/java">0.6-incubating
> +
>              Javadocs</a></li>
> +                                                               </ul></li>
> +                                                       <li><a href="#"
> class="af-nav-links">Community <b
> +
>  class="caret"></b></a>
> +                                                               <ul
> class="af-dropdown-menu">
> +
>  <li><a href="/community.html#mailing-lists">Mailing
> +
>              Lists</a></li>
> +
>  <li><a href="/community.html#issues">Issues</a></li>
> +
>  <li><a href="/community.html#team">Team</a></li>
> +
>  <li class="divider"></li>
> +
>  <li><a href="/how-to-contribute.html">How To
> +
>              Contribute</a></li>
> +
>  <li><a href="/coding_guidelines.html">Coding
> +
>              Guidelines</a></li>
> +                                                               </ul></li>
> +                                                       <li><a href="#"
> class="af-nav-links">Project <b
> +
>  class="caret"></b></a>
> +                                                               <ul
> class="af-dropdown-menu">
> +
>  <li><a href="/material.html">Material</a></li>
> +
>  <li><a href="http://www.apache.org/">Apache Software
> +
>              Foundation <span class="glyphicon glyphicon-new-window"></span>
> +
>  </a></li>
> +
>  <li><a
> +
>      href="https://cwiki.apache.org/confluence/display/FLINK">Wiki
> +
>              <span class="glyphicon glyphicon-new-window"></span>
> +
>  </a></li>
> +
>  <li><a
> +
>      href="https://wiki.apache.org/incubator/StratosphereProposal
> ">Incubator
> +
>              Proposal <span class="glyphicon glyphicon-new-window"></span>
> +
>  </a></li>
> +
>  <li><a href="http://www.apache.org/licenses/LICENSE-2.0">License
> +
>              <span class="glyphicon glyphicon-new-window"></span>
> +
>  </a></li>
> +
>  <li><a href="https://github.com/apache/incubator-flink">Source
> +
>              Code <span class="glyphicon glyphicon-new-window"></span>
> +
>  </a></li>
> +                                                               </ul></li>
> +                                                       <li><a
> href="/blog/index.html" class="">Blog</a></li>
> +                                               </ul>
> +                                       </nav>
> +                               </div>
> +                       </div>
> +               </div>
> +       </header>
> +</div>
> +
> +
> +    <div style="padding-top:120px" class="container">
> +        <div class="container">
> +    <div class="row">
> +               <div class="col-md-2"></div>
> +               <div class="col-md-8">
> +                       <article>
> +                               <h2>December 2014 in the Flink
> community</h2>
> +                                   <p class="meta">06 Jan 2015</p>
> +                               <div>
> +                                   <p>This is the first blog post of a
> “newsletter† like series where we give a summary of the monthly
> activity in the Flink community. As the Flink project grows, this can serve
> as a &quot;tl;dr&quot; for people that are not following the Flink dev and
> user mailing lists, or those that are simply overwhelmed by the traffic.</p>
> +
> +<h3 id="flink-graduation">Flink graduation</h3>
> +
> +<p>The biggest news is that the Apache board approved Flink as a
> top-level Apache project! The Flink team is working closely with the Apache
> press team for an official announcement, so stay tuned for details!</p>
> +
> +<h3 id="new-flink-website">New Flink website</h3>
> +
> +<p>The <a href="http://flink.apache.org">Flink website</a> got a total
> make-over, both in terms of appearance and content.</p>
> +
> +<h3 id="flink-irc-channel">Flink IRC channel</h3>
> +
> +<p>A new IRC channel called #flink was created at irc.freenode.org. An
> easy way to access the IRC channel is through the <a href="
> http://webchat.freenode.net/">web client</a>.  Feel free to stop by to
> ask anything or share your ideas about Apache Flink!</p>
> +
> +<h3 id="meetups-and-talks">Meetups and Talks</h3>
> +
> +<p>Apache Flink was presented in the <a href="
> http://www.meetup.com/Netherlands-Hadoop-User-Group/events/218635152">Amsterdam
> Hadoop User Group</a></p>
> +
> +<h2 id="notable-code-contributions">Notable code contributions</h2>
> +
> +<p><strong>Note:</strong> Code contributions listed here may not be part
> of a release or even the current snapshot yet.</p>
> +
> +<h3 id="streaming-scala-api"><a href="
> https://github.com/apache/incubator-flink/pull/275">Streaming Scala
> API</a></h3>
> +
> +<p>The Flink Streaming Java API recently got its Scala counterpart. Once
> merged, Flink Streaming users can use both Scala and Java for their
> development. The Flink Streaming Scala API is built as a thin layer on top
> of the Java API, making sure that the APIs are kept easily in sync.</p>
> +
> +<h3 id="intermediate-datasets"><a href="
> https://github.com/apache/incubator-flink/pull/254">Intermediate
> datasets</a></h3>
> +
> +<p>This pull request introduces a major change in the Flink runtime.
> Currently, the Flink runtime is based on the notion of operators that
> exchange data through channels. With the PR, intermediate data sets that
> are produced by operators become first-class citizens in the runtime. While
> this does not have any user-facing impact yet, it lays the groundwork for a
> slew of future features such as blocking execution, fine-grained
> fault-tolerance, and more efficient data sharing between cluster and
> client.</p>
> +
> +<h3 id="configurable-execution-mode"><a href="
> https://github.com/apache/incubator-flink/pull/259">Configurable
> execution mode</a></h3>
> +
> +<p>This pull request allows the user to change the object-reuse
> behaviour. Before this pull request, some operations would reuse objects
> passed to the user function while others would always create new objects.
> This introduces a system wide switch and changes all operators to either
> reuse objects or don’t reuse objects.</p>
> +
> +<h3 id="distributed-coordination-via-akka"><a href="
> https://github.com/apache/incubator-flink/pull/149">Distributed
> Coordination via Akka</a></h3>
> +
> +<p>Another major change is a complete rewrite of the JobManager /
> TaskManager components in Scala. In addition to that, the old RPC service
> was replaced by Actors, using the Akka framework.</p>
> +
> +<h3 id="sorting-of-very-large-records"><a href="
> https://github.com/apache/incubator-flink/pull/249">Sorting of very large
> records</a></h3>
> +
> +<p>Flink&#39;s internal sort-algorithms were improved to better handle
> large records (multiple 100s of megabytes or larger). Previously, the
> system did in some cases hold instances of multiple large records,
> resulting in high memory consumption and JVM heap thrashing. Through this
> fix, large records are streamed through the operators, reducing the memory
> consumption and GC pressure. The system now requires much less memory to
> support algorithms that work on such large records.</p>
> +
> +<h3 id="kryo-serialization-as-the-new-default-fallback"><a href="
> https://github.com/apache/incubator-flink/pull/271">Kryo Serialization as
> the new default fallback</a></h3>
> +
> +<p>Flink’s build-in type serialization framework is handles all common
> types very efficiently. Prior versions uses Avro to serialize types that
> the built-in framework could not handle.
> +Flink serialization system improved a lot over time and by now surpasses
> the capabilities of Avro in many cases. Kryo now serves as the default
> fallback serialization framework, supporting a much broader range of
> types.</p>
> +
> +<h3 id="hadoop-filesystem-support"><a href="
> https://github.com/apache/incubator-flink/pull/268">Hadoop FileSystem
> support</a></h3>
> +
> +<p>This change permits users to use all file systems supported by Hadoop
> with Flink. In practice this means that users can use Flink with Tachyon,
> Google Cloud Storage (also out of the box Flink YARN support on Google
> Compute Cloud), FTP and all the other file system implementations for
> Hadoop.</p>
> +
> +<h2 id="heading-to-the-0.8.0-release">Heading to the 0.8.0 release</h2>
> +
> +<p>The community is working hard together with the Apache infra team to
> migrate the Flink infrastructure to a top-level project. At the same time,
> the Flink community is working on the Flink 0.8.0 release which should be
> out very soon.</p>
> +
> +                               </div>
> +                       </article>
> +               </div>
> +               <div class="col-md-2"></div>
> +       </div>
> +       <div class="row" style="padding-top:30px">
> +               <div class="col-md-2"></div>
> +               <div class="col-md-8">
> +                   <div id="disqus_thread"></div>
> +                   <script type="text/javascript">
> +                       /* * * CONFIGURATION VARIABLES: EDIT BEFORE
> PASTING INTO YOUR WEBPAGE * * */
> +                       var disqus_shortname = 'stratosphere-eu'; //
> required: replace example with your forum shortname
> +
> +                       /* * * DON'T EDIT BELOW THIS LINE * * */
> +                       (function() {
> +                           var dsq = document.createElement('script');
> dsq.type = 'text/javascript'; dsq.async = true;
> +                           dsq.src = '//' + disqus_shortname + '.
> disqus.com/embed.js';
> +                           (document.getElementsByTagName('head')[0] ||
> document.getElementsByTagName('body')[0]).appendChild(dsq);
> +                       })();
> +                   </script>
> +                   <noscript>Please enable JavaScript to view the <a
> href="http://disqus.com/?ref_noscript">comments powered by
> Disqus.</a></noscript>
> +                   <a href="http://disqus.com"
> class="dsq-brlink">comments powered by <span
> class="logo-disqus">Disqus</span></a>
> +               </div>
> +               <div class="col-md-2"></div>
> +       </div>
> +</div>
> +
> +    </div>
> +    <section id="af-upfooter" class="af-section">
> +       <div class="container">
> +               <p>Apache Flink is an effort undergoing incubation at The
> Apache
> +                       Software Foundation (ASF), sponsored by the Apache
> Incubator PMC.
> +                       Incubation is required of all newly accepted
> projects until a further
> +                       review indicates that the infrastructure,
> communications, and
> +                       decision making process have stabilized in a
> manner consistent with
> +                       other successful ASF projects. While incubation
> status is not
> +                       necessarily a reflection of the completeness or
> stability of the
> +                       code, it does indicate that the project has yet to
> be fully endorsed
> +                       by the ASF.</p>
> +               <a href="http://incubator.apache.org"> <img
> class="img-responsive"
> +                       src="/img/main/apache-incubator-logo.png"
> alt="Apache Flink" />
> +               </a>
> +               <p class="text-center">
> +                       <a href="/privacy-policy.html" title="Privacy
> Policy"
> +                               class="af-privacy-policy">Privacy
> Policy</a>
> +               </p>
> +       </div>
> +</section>
> +
> +<footer id="af-footer">
> +       <div class="container">
> +               <div class="row">
> +                       <div class="col-md-3">
> +                               <h3>Documentation</h3>
> +                               <ul class="af-footer-menu">
> +                                       <li><a
> href="/docs/0.6-incubating/">0.6 Incubating</a></li>
> +                                       <li><a
> href="/docs/0.6-incubating/api/java/">0.6
> +                                                       Incubating
> Javadocs</a></li>
> +                                       <li><a
> href="/docs/0.7-incubating/">0.7 Incubating</a></li>
> +                                       <li><a
> href="/docs/0.7-incubating/api/java/">0.7
> +                                                       Incubating
> Javadocs</a></li>
> +                                       <li><a
> +
>  href="/docs/0.7-incubating/api/scala/index.html#org.apache.flink.api.scala.package">0.7
> +                                                       Incubating
> Scaladocs</a></li>
> +                               </ul>
> +                       </div>
> +                       <div class="col-md-3">
> +                               <h3>Community</h3>
> +                               <ul class="af-footer-menu">
> +                                       <li><a
> href="/community.html#mailing-lists">Mailing Lists</a></li>
> +                                       <li><a href="
> https://issues.apache.org/jira/browse/FLINK"
> +                                               target="blank">Issues <span
> +                                                       class="glyphicon
> glyphicon-new-window"></span></a></li>
> +                                       <li><a
> href="/community.html#team">Team</a></li>
> +                                       <li><a
> href="/how-to-contribute.html">How to contribute</a></li>
> +                                       <li><a
> href="/coding_guidelines.html">Coding Guidelines</a></li>
> +                               </ul>
> +                       </div>
> +                       <div class="col-md-3">
> +                               <h3>ASF</h3>
> +                               <ul class="af-footer-menu">
> +                                       <li><a href="
> http://www.apache.org/" target="blank">Apache
> +                                                       Software
> foundation <span class="glyphicon glyphicon-new-window"></span>
> +                                       </a></li>
> +                                       <li><a
> +                                               href="
> http://www.apache.org/foundation/how-it-works.html"
> +                                               target="blank">How it
> works <span
> +                                                       class="glyphicon
> glyphicon-new-window"></span></a></li>
> +                                       <li><a href="
> http://www.apache.org/foundation/thanks.html"
> +                                               target="blank">Thanks <span
> +                                                       class="glyphicon
> glyphicon-new-window"></span></a></li>
> +                                       <li><a
> +                                               href="
> http://www.apache.org/foundation/sponsorship.html"
> +                                               target="blank">Become a
> sponsor <span
> +                                                       class="glyphicon
> glyphicon-new-window"></span></a></li>
> +                                       <li><a href="
> http://incubator.apache.org/projects/flink.html"
> +                                               target="blank">Incubation
> status page <span
> +                                                       class="glyphicon
> glyphicon-new-window"></span></a></li>
> +                               </ul>
> +                       </div>
> +                       <div class="col-md-3">
> +                               <h3>Project</h3>
> +                               <ul class="af-footer-menu">
> +                                       <li><a href="/material.html"
> target="blank">Material <span
> +                                                       class="glyphicon
> glyphicon-new-window"></span></a></li>
> +                                       <li><a
> +                                               href="
> https://cwiki.apache.org/confluence/display/FLINK"
> +                                               target="blank">Wiki <span
> +                                                       class="glyphicon
> glyphicon-new-window"></span></a></li>
> +                                       <li><a
> +                                               href="
> https://wiki.apache.org/incubator/StratosphereProposal"
> +                                               target="blank">Incubator
> proposal <span
> +                                                       class="glyphicon
> glyphicon-new-window"></span></a></li>
> +                                       <li><a href="
> http://www.apache.org/licenses/LICENSE-2.0"
> +                                               target="blank">License
> <span
> +                                                       class="glyphicon
> glyphicon-new-window"></span></a></li>
> +                                       <li><a href="
> https://github.com/apache/incubator-flink"
> +                                               target="blank">Source code
> <span
> +                                                       class="glyphicon
> glyphicon-new-window"></span></a></li>
> +                               </ul>
> +                       </div>
> +               </div>
> +       </div>
> +       <div class="af-footer-bar">
> +               <div class="container">
> +                       <div class="row">
> +                               <div class="col-md-6">
> +                                 Copyright &copy 2014-2015, <a href="
> http://www.apache.org">The Apache Software Foundation</a>. All Rights
> Reserved.
> +                               </div>
> +                               <div class="col-md-5 text-right"></div>
> +                       </div>
> +               </div>
> +       </div>
> +</footer>
> +
> +    <!-- Google Analytics -->
> +    <script>
> +
> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
> +      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new
> Date();a=s.createElement(o),
> +
> m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
> +      })(window,document,'script','//
> www.google-analytics.com/analytics.js','ga');
> +
> +      ga('create', 'UA-52545728-1', 'auto');
> +      ga('send', 'pageview');
> +    </script>
> +    <script src="/js/main/jquery.mobile.events.min.js"></script>
> +    <script src="/js/main/main.js"></script>
> +  </body>
> +</html>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message