Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A8DBB200C0F for ; Wed, 18 Jan 2017 15:08:01 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A75BE160B59; Wed, 18 Jan 2017 14:08:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 55A0F160B3A for ; Wed, 18 Jan 2017 15:07:59 +0100 (CET) Received: (qmail 92325 invoked by uid 500); 18 Jan 2017 14:05:31 -0000 Mailing-List: contact commits-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list commits@flink.apache.org Received: (qmail 86707 invoked by uid 99); 18 Jan 2017 14:04:34 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jan 2017 14:04:34 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 8348AF2147; Wed, 18 Jan 2017 14:04:23 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: rmetzger@apache.org To: commits@flink.apache.org Date: Wed, 18 Jan 2017 14:04:35 -0000 Message-Id: <512152d235ae49b2bba1dab45e9668ae@git.apache.org> In-Reply-To: <3d6f9d1b75fc4af494515002c0d0986a@git.apache.org> References: <3d6f9d1b75fc4af494515002c0d0986a@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [13/39] flink-web git commit: Rebuild site archived-at: Wed, 18 Jan 2017 14:08:01 -0000 http://git-wip-us.apache.org/repos/asf/flink-web/blob/9ec0a879/content/news/2015/12/04/Introducing-windows.html ---------------------------------------------------------------------- diff --git a/content/news/2015/12/04/Introducing-windows.html b/content/news/2015/12/04/Introducing-windows.html new file mode 100644 index 0000000..aa0834f --- /dev/null +++ b/content/news/2015/12/04/Introducing-windows.html @@ -0,0 +1,349 @@ + + + + + + + + Apache Flink: Introducing Stream Windows in Apache Flink + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + +
+
+
+
+

Introducing Stream Windows in Apache Flink

+ +
+

04 Dec 2015 by Fabian Hueske (@fhueske)

+ +

The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache Flink is a production-ready stream processor with an easy-to-use yet very expressive API to define advanced stream analysis programs. Flink’s API features very flexible window definitions on data streams which let it stand out among other open source stream processors.

+ +

In this blog post, we discuss the concept of windows for stream processing, present Flink’s built-in windows, and explain its support for custom windowing semantics.

+ +

What are windows and what are they good for?

+ +

Consider the example of a traffic sensor that counts every 15 seconds the number of vehicles passing a certain location. The resulting stream could look like:

+ +
+ +
+ +

If you would like to know, how many vehicles passed that location, you would simply sum the individual counts. However, the nature of a sensor stream is that it continuously produces data. Such a stream never ends and it is not possible to compute a final sum that can be returned. Instead, it is possible to compute rolling sums, i.e., return for each input event an updated sum record. This would yield a new stream of partial sums.

+ +
+ +
+ +

However, a stream of partial sums might not be what we are looking for, because it constantly updates the count and even more important, some information such as variation over time is lost. Hence, we might want to rephrase our question and ask for the number of cars that pass the location every minute. This requires us to group the elements of the stream into finite sets, each set corresponding to sixty seconds. This operation is called a tumbling windows operation.

+ +
+ +
+ +

Tumbling windows discretize a stream into non-overlapping windows. For certain applications it is important that windows are not disjunct because an application might require smoothed aggregates. For example, we can compute every thirty seconds the number of cars passed in the last minute. Such windows are called sliding windows.

+ +
+ +
+ +

Defining windows on a data stream as discussed before is a non-parallel operation. This is because each element of a stream must be processed by the same window operator that decides which windows the element should be added to. Windows on a full stream are called AllWindows in Flink. For many applications, a data stream needs to be grouped into multiple logical streams on each of which a window operator can be applied. Think for example about a stream of vehicle counts from multiple traffic sensors (instead of only one sensor as in our previous example), where each sensor monitors a different location. By grouping the stream by sensor id, we can compute windowed traffic statistics for each location in parallel. In Flink, we call such partitioned windows simply Windows, as they are the common case for distributed streams. The following figure shows tumbling windows that collect two elements over a stream of (sensorId, count) pair elements.

+ +
+ +
+ +

Generally speaking, a window defines a finite set of elements on an unbounded stream. This set can be based on time (as in our previous examples), element counts, a combination of counts and time, or some custom logic to assign elements to windows. Flink’s DataStream API provides concise operators for the most common window operations as well as a generic windowing mechanism that allows users to define very custom windowing logic. In the following we present Flink’s time and count windows before discussing its windowing mechanism in detail.

+ +

Time Windows

+ +

As their name suggests, time windows group stream elements by time. For example, a tumbling time window of one minute collects elements for one minute and applies a function on all elements in the window after one minute passed.

+ +

Defining tumbling and sliding time windows in Apache Flink is very easy:

+ +
// Stream of (sensorId, carCnt)
+val vehicleCnts: DataStream[(Int, Int)] = ...
+
+val tumblingCnts: DataStream[(Int, Int)] = vehicleCnts
+  // key stream by sensorId
+  .keyBy(0) 
+  // tumbling time window of 1 minute length
+  .timeWindow(Time.minutes(1))
+  // compute sum over carCnt
+  .sum(1) 
+
+val slidingCnts: DataStream[(Int, Int)] = vehicleCnts
+  .keyBy(0) 
+  // sliding time window of 1 minute length and 30 secs trigger interval
+  .timeWindow(Time.minutes(1), Time.seconds(30))
+  .sum(1)
+ +

There is one aspect that we haven’t discussed yet, namely the exact meaning of “collects elements for one minute” which boils down to the question, “How does the stream processor interpret time?”.

+ +

Apache Flink features three different notions of time, namely processing time, event time, and ingestion time.

+ +
    +
  1. In processing time, windows are defined with respect to the wall clock of the machine that builds and processes a window, i.e., a one minute processing time window collects elements for exactly one minute.
  2. +
  3. In event time, windows are defined with respect to timestamps that are attached to each event record. This is common for many types of events, such as log entries, sensor data, etc, where the timestamp usually represents the time at which the event occurred. Event time has several benefits over processing time. First of all, it decouples the program semantics from the actual serving speed of the source and the processing performance of system. Hence you can process historic data, which is served at maximum speed, and continuously produced data with the same program. It also prevents semantically incorrect results in case of backpressure or delays due to failure recovery. Second, event time windows compute correct results, even if events arrive out-of-order of their timestamp which is common if a data stream gathers events from distributed sources.
  4. +
  5. Ingestion time is a hybrid of processing and event time. It assigns wall clock timestamps to records as soon as they arrive in the system (at the source) and continues processing with event time semantics based on the attached timestamps.
  6. +
+ +

Count Windows

+ +

Apache Flink also features count windows. A tumbling count window of 100 will collect 100 events in a window and evaluate the window when the 100th element has been added.

+ +

In Flink’s DataStream API, tumbling and sliding count windows are defined as follows:

+ +
// Stream of (sensorId, carCnt)
+val vehicleCnts: DataStream[(Int, Int)] = ...
+
+val tumblingCnts: DataStream[(Int, Int)] = vehicleCnts
+  // key stream by sensorId
+  .keyBy(0)
+  // tumbling count window of 100 elements size
+  .countWindow(100)
+  // compute the carCnt sum 
+  .sum(1)
+
+val slidingCnts: DataStream[(Int, Int)] = vehicleCnts
+  .keyBy(0)
+  // sliding count window of 100 elements size and 10 elements trigger interval
+  .countWindow(100, 10)
+  .sum(1)
+ + + +

Flink’s built-in time and count windows cover a wide range of common window use cases. However, there are of course applications that require custom windowing logic that cannot be addressed by Flink’s built-in windows. In order to support also applications that need very specific windowing semantics, the DataStream API exposes interfaces for the internals of its windowing mechanics. These interfaces give very fine-grained control about the way that windows are built and evaluated.

+ +

The following figure depicts Flink’s windowing mechanism and introduces the components being involved.

+ +
+ +
+ +

Elements that arrive at a window operator are handed to a WindowAssigner. The WindowAssigner assigns elements to one or more windows, possibly creating new windows. A Window itself is just an identifier for a list of elements and may provide some optional meta information, such as begin and end time in case of a TimeWindow. Note that an element can be added to multiple windows, which also means that multiple windows can exist at the same time.

+ +

Each window owns a Trigger that decides when the window is evaluated or purged. The trigger is called for each element that is inserted into the window and when a previously registered timer times out. On each event, a trigger can decide to fire (i.e., evaluate), purge (remove the window and discard its content), or fire and then purge the window. A trigger that just fires evaluates the window and keeps it as it is, i.e., all elements remain in the window and are evaluated again when the triggers fires the next time. A window can be evaluated several times and exists until it is purged. Note that a window consumes memory until it is purged.

+ +

When a Trigger fires, the list of window elements can be given to an optional Evictor. The evictor can iterate through the list and decide to cut off some elements from the start of the list, i.e., remove some of the elements that entered the window first. The remaining elements are given to an evaluation function. If no Evictor was defined, the Trigger hands all the window elements directly to the evaluation function.

+ +

The evaluation function receives the elements of a window (possibly filtered by an Evictor) and computes one or more result elements for the window. The DataStream API accepts different types of evaluation functions, including predefined aggregation functions such as sum(), min(), max(), as well as a ReduceFunction, FoldFunction, or WindowFunction. A WindowFunction is the most generic evaluation function and receives the window object (i.e, the meta data of the window), the list of window elements, and the window key (in case of a keyed window) as parameters.

+ +

These are the components that constitute Flink’s windowing mechanics. We now show step-by-step how to implement custom windowing logic with the DataStream API. We start with a stream of type DataStream[IN] and key it using a key selector function that extracts a key of type KEY to obtain a KeyedStream[IN, KEY].

+ +
val input: DataStream[IN] = ...
+
+// created a keyed stream using a key selector function
+val keyed: KeyedStream[IN, KEY] = input
+  .keyBy(myKeySel: (IN) => KEY)
+ +

We apply a WindowAssigner[IN, WINDOW] that creates windows of type WINDOW resulting in a WindowedStream[IN, KEY, WINDOW]. In addition, a WindowAssigner also provides a default Trigger implementation.

+ +
// create windowed stream using a WindowAssigner
+var windowed: WindowedStream[IN, KEY, WINDOW] = keyed
+  .window(myAssigner: WindowAssigner[IN, WINDOW])
+ +

We can explicitly specify a Trigger to overwrite the default Trigger provided by the WindowAssigner. Note that specifying a triggers does not add an additional trigger condition but replaces the current trigger.

+ +
// override the default trigger of the WindowAssigner
+windowed = windowed
+  .trigger(myTrigger: Trigger[IN, WINDOW])
+ +

We may want to specify an optional Evictor as follows.

+ +
// specify an optional evictor
+windowed = windowed
+  .evictor(myEvictor: Evictor[IN, WINDOW])
+ +

Finally, we apply a WindowFunction that returns elements of type OUT to obtain a DataStream[OUT].

+ +
// apply window function to windowed stream
+val output: DataStream[OUT] = windowed
+  .apply(myWinFunc: WindowFunction[IN, OUT, KEY, WINDOW])
+ +

With Flink’s internal windowing mechanics and its exposure through the DataStream API it is possible to implement very custom windowing logic such as session windows or windows that emit early results if the values exceed a certain threshold.

+ +

Conclusion

+ +

Support for various types of windows over continuous data streams is a must-have for modern stream processors. Apache Flink is a stream processor with a very strong feature set, including a very flexible mechanism to build and evaluate windows over continuous data streams. Flink provides pre-defined window operators for common uses cases as well as a toolbox that allows to define very custom windowing logic. The Flink community will add more pre-defined window operators as we learn the requirements from our users.

+ +
+
+ +
+
+ +
+
+
+
+
+ +
+ +
+ +
+
+ + + + + + + + + + + http://git-wip-us.apache.org/repos/asf/flink-web/blob/9ec0a879/content/news/2015/12/11/storm-compatibility.html ---------------------------------------------------------------------- diff --git a/content/news/2015/12/11/storm-compatibility.html b/content/news/2015/12/11/storm-compatibility.html new file mode 100644 index 0000000..c50b031 --- /dev/null +++ b/content/news/2015/12/11/storm-compatibility.html @@ -0,0 +1,339 @@ + + + + + + + + Apache Flink: Storm Compatibility in Apache Flink: How to run existing Storm topologies on Flink + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + +
+
+
+
+

Storm Compatibility in Apache Flink: How to run existing Storm topologies on Flink

+ +
+

11 Dec 2015 by Matthias J. Sax (@MatthiasJSax)

+ +

Apache Storm was one of the first distributed and scalable stream processing systems available in the open source space offering (near) real-time tuple-by-tuple processing semantics. +Initially released by the developers at Backtype in 2011 under the Eclipse open-source license, it became popular very quickly. +Only shortly afterwards, Twitter acquired Backtype. +Since then, Storm has been growing in popularity, is used in production at many big companies, and is the de-facto industry standard for big data stream processing. +In 2013, Storm entered the Apache incubator program, followed by its graduation to top-level in 2014.

+ +

Apache Flink is a stream processing engine that improves upon older technologies like Storm in several dimensions, +including strong consistency guarantees (“exactly once”), +a higher level DataStream API, +support for event time and a rich windowing system, +as well as superior throughput with competitive low latency.

+ +

While Flink offers several technical benefits over Storm, an existing investment on a codebase of applications developed for Storm often makes it difficult to switch engines. +For these reasons, as part of the Flink 0.10 release, Flink ships with a Storm compatibility package that allows users to:

+ +
    +
  • Run unmodified Storm topologies using Apache Flink benefiting from superior performance.
  • +
  • Embed Storm code (spouts and bolts) as operators inside Flink DataStream programs.
  • +
+ +

Only minor code changes are required in order to submit the program to Flink instead of Storm. +This minimizes the work for developers to run existing Storm topologies while leveraging Apache Flink’s fast and robust execution engine.

+ +

We note that the Storm compatibility package is continuously improving and does not cover the full spectrum of Storm’s API. +However, it is powerful enough to cover many use cases.

+ + + +
+ +
+ +

The easiest way to use the Storm compatibility package is by executing a whole Storm topology in Flink. +For this, you only need to replace the dependency storm-core by flink-storm in your Storm project and change two lines of code in your original Storm program.

+ +

The following example shows a simple Storm-Word-Count-Program that can be executed in Flink. +First, the program is assembled the Storm way without any code change to Spouts, Bolts, or the topology itself.

+ +
// assemble topology, the Storm way
+TopologyBuilder builder = new TopologyBuilder();
+builder.setSpout("source", new StormFileSpout(inputFilePath));
+builder.setBolt("tokenizer", new StormBoltTokenizer())
+       .shuffleGrouping("source");
+builder.setBolt("counter", new StormBoltCounter())
+       .fieldsGrouping("tokenizer", new Fields("word"));
+builder.setBolt("sink", new StormBoltFileSink(outputFilePath))
+       .shuffleGrouping("counter");
+ +

In order to execute the topology, we need to translate it to a FlinkTopology and submit it to a local or remote Flink cluster, very similar to submitting the application to a Storm cluster.1

+ +
// transform Storm topology to Flink program
+// replaces: StormTopology topology = builder.createTopology();
+FlinkTopology topology = FlinkTopology.createTopology(builder);
+
+Config conf = new Config();
+if(runLocal) {
+	// use FlinkLocalCluster instead of LocalCluster
+	FlinkLocalCluster cluster = FlinkLocalCluster.getLocalCluster();
+	cluster.submitTopology("WordCount", conf, topology);
+} else {
+	// use FlinkSubmitter instead of StormSubmitter
+	FlinkSubmitter.submitTopology("WordCount", conf, topology);
+}
+ +

As a shorter Flink-style alternative that replaces the Storm-style submission code, you can also use context-based job execution:

+ +
// transform Storm topology to Flink program (as above)
+FlinkTopology topology = FlinkTopology.createTopology(builder);
+
+// executes locally by default or remotely if submitted with Flink's command-line client
+topology.execute()
+ +

After the code is packaged in a jar file (e.g., StormWordCount.jar), it can be easily submitted to Flink via

+ +
bin/flink run StormWordCount.jar
+
+ +

The used Spouts and Bolts as well as the topology assemble code is not changed at all! +Only the translation and submission step have to be changed to the Storm-API compatible Flink pendants. +This allows for minimal code changes and easy adaption to Flink.

+ + + +

It is also possible to use Spouts and Bolts within a regular Flink DataStream program. +The compatibility package provides wrapper classes for Spouts and Bolts which are implemented as a Flink SourceFunction and StreamOperator respectively. +Those wrappers automatically translate incoming Flink POJO and TupleXX records into Storm’s Tuple type and emitted Values back into either POJOs or TupleXX types for further processing by Flink operators. +As Storm is type agnostic, it is required to specify the output type of embedded Spouts/Bolts manually to get a fully typed Flink streaming program.

+ +
// use regular Flink streaming environment
+StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
+
+// use Spout as source
+DataStream<Tuple1<String>> source = 
+  env.addSource(// Flink provided wrapper including original Spout
+                new SpoutWrapper<String>(new FileSpout(localFilePath)), 
+                // specify output type manually
+                TypeExtractor.getForObject(new Tuple1<String>("")));
+// FileSpout cannot be parallelized
+DataStream<Tuple1<String>> text = source.setParallelism(1);
+
+// further processing with Flink
+DataStream<Tuple2<String,Integer> tokens = text.flatMap(new Tokenizer()).keyBy(0);
+
+// use Bolt for counting
+DataStream<Tuple2<String,Integer> counts =
+  tokens.transform("Counter",
+                   // specify output type manually
+                   TypeExtractor.getForObject(new Tuple2<String,Integer>("",0))
+                   // Flink provided wrapper including original Bolt
+                   new BoltWrapper<String,Tuple2<String,Integer>>(new BoltCounter()));
+
+// write result to file via Flink sink
+counts.writeAsText(outputPath);
+
+// start Flink job
+env.execute("WordCount with Spout source and Bolt counter");
+ +

Although some boilerplate code is needed (we plan to address this soon!), the actual embedded Spout and Bolt code can be used unmodified. +We also note that the resulting program is fully typed, and type errors will be found by Flink’s type extractor even if the original Spouts and Bolts are not.

+ +

Outlook

+ +

The Storm compatibility package is currently in beta and undergoes continuous development. +We are currently working on providing consistency guarantees for stateful Bolts. +Furthermore, we want to provide a better API integration for embedded Spouts and Bolts by providing a “StormExecutionEnvironment” as a special extension of Flink’s StreamExecutionEnvironment. +We are also investigating the integration of Storm’s higher-level programming API Trident.

+ +

Summary

+ +

Flink’s compatibility package for Storm allows using unmodified Spouts and Bolts within Flink. +This enables you to even embed third-party Spouts and Bolts where the source code is not available. +While you can embed Spouts/Bolts in a Flink program and mix-and-match them with Flink operators, running whole topologies is the easiest way to get started and can be achieved with almost no code changes.

+ +

If you want to try out Flink’s Storm compatibility package checkout our Documentation.

+ +
+ +

1. We confess, there are three lines changed compared to a Storm project —because the example covers local and remote execution.

+ + +
+
+ +
+
+ +
+
+
+
+
+ +
+ +
+ +
+
+ + + + + + + + + + + http://git-wip-us.apache.org/repos/asf/flink-web/blob/9ec0a879/content/news/2015/12/18/a-year-in-review.html ---------------------------------------------------------------------- diff --git a/content/news/2015/12/18/a-year-in-review.html b/content/news/2015/12/18/a-year-in-review.html new file mode 100644 index 0000000..c65ddc8 --- /dev/null +++ b/content/news/2015/12/18/a-year-in-review.html @@ -0,0 +1,416 @@ + + + + + + + + Apache Flink: Flink 2015: A year in review, and a lookout to 2016 + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + +
+
+
+
+

Flink 2015: A year in review, and a lookout to 2016

+ +
+

18 Dec 2015 by Robert Metzger (@rmetzger_)

+ +

With 2015 ending, we thought that this would be good time to reflect +on the amazing work done by the Flink community over this past year, +and how much this community has grown.

+ +

Overall, we have seen Flink grow in terms of functionality from an +engine to one of the most complete open-source stream processing +frameworks available. The community grew from a relatively small and +geographically focused team, to a truly global, and one of the largest +big data communities in the the Apache Software Foundation.

+ +

We will also look at some interesting stats, including that the +busiest days for Flink are Mondays (who would have thought :-).

+ +

Community growth

+ +

Let us start with some simple statistics from Flink’s +github repository. During 2015, the +Flink community doubled in size, from about 75 contributors to +over 150. Forks of the repository more than tripled from 160 in +February 2015 to 544 in December 2015, and the number of stars of the +repository almost tripled from 289 to 813.

+ +
+ +
+ +

Although Flink started out geographically in Berlin, Germany, the +community is by now spread all around the globe, with many +contributors from North America, Europe, and Asia. A simple search at +meetup.com for groups that mention Flink as a focus area reveals 16 +meetups around the globe:

+ +
+ +
+ +

Flink Forward 2015

+ +

One of the highlights of the year for Flink was undoubtedly the Flink +Forward conference, the first conference +on Apache Flink that was held in October in Berlin. More than 250 +participants (roughly half based outside Germany where the conference +was held) attended more than 33 technical talks from organizations +including Google, MongoDB, Bouygues Telecom, NFLabs, Euranova, RedHat, +IBM, Huawei, Intel, Ericsson, Capital One, Zalando, Amadeus, the Otto +Group, and ResearchGate. If you have not yet watched their talks, +check out the slides and +videos +from Flink Forward.

+ +
+ +
+ +

Media coverage

+ +

And of course, interest in Flink was picked up by the tech +media. During 2015, articles about Flink appeared in +InfoQ, +ZDNet, +Datanami, +Infoworld +(including being one of the best open source big data tools of +2015), +the Gartner +blog, +Dataconomy, +SDTimes, the MapR +blog, +KDnuggets, +and +HadoopSphere.

+ +
+ +
+ +

It is interesting to see that Hadoop Summit EMEA 2016 had a whopping +number of 17 (!) talks submitted that are mentioning Flink in their +title and abstract:

+ +
+ +
+ +

Fun with stats: when do committers commit?

+ +

To get some deeper insight on what is happening in the Flink +community, let us do some analytics on the git log of the project :-) +The easiest thing we can do is count the number of commits at the +repository in 2015. Running

+ +
git log --pretty=oneline --after=1/1/2015  | wc -l
+
+ +

on the Flink repository yields a total of 2203 commits in 2015.

+ +

To dig deeper, we will use an open source tool called gitstats that +will give us some interesting statistics on the committer +behavior. You can create these also yourself and see many more by +following four easy steps:

+ +
    +
  1. Download gitstats from the project homepage.. E.g., on OS X with homebrew, type
  2. +
+ +
brew install --HEAD homebrew/head-only/gitstats
+
+ +
    +
  1. Clone the Apache Flink git repository:
  2. +
+ +
git clone git@github.com:apache/flink.git
+
+ +
    +
  1. Generate the statistics
  2. +
+ +
gitstats flink/ flink-stats/
+
+ +
    +
  1. View all the statistics as an html page using your favorite browser (e.g., chrome):
  2. +
+ +
chrome flink-stats/index.html
+
+ +

First, we can see a steady growth of lines of code in Flink since the +initial Apache incubator project. During 2015, the codebase almost +doubled from 500,000 LOC to 900,000 LOC.

+ +
+ +
+ +

It is interesting to see when committers commit. For Flink, Monday +afternoons are by far the most popular times to commit to the +repository:

+ +
+ +
+ +

Feature timeline

+ +

So, what were the major features added to Flink and the Flink +ecosystem during 2015? Here is a (non-exhaustive) chronological list:

+ +
+ +
+ +

Roadmap for 2016

+ +

With 2015 coming to a close, the Flink community has already started +discussing Flink’s roadmap for the future. Some highlights +are:

+ +
    +
  • +

    Runtime scaling of streaming jobs: streaming jobs are running + forever, and need to react to a changing environment. Runtime + scaling means dynamically increasing and decreasing the + parallelism of a job to sustain certain SLAs, or react to changing + input throughput.

    +
  • +
  • +

    SQL queries for static data sets and streams: building on top of + Flink’s Table API, users should be able to write SQL + queries for static data sets, as well as SQL queries on data + streams that continuously produce new results.

    +
  • +
  • +

    Streaming operators backed by managed memory: currently, + streaming operators like user-defined state and windows are backed + by JVM heap objects. Moving those to Flink managed memory will add + the ability to spill to disk, GC efficiency, as well as better + control over memory utilization.

    +
  • +
  • +

    Library for detecting temporal event patterns: a common use case + for stream processing is detecting patterns in an event stream + with timestamps. Flink makes this possible with its support for + event time, so many of these operators can be surfaced in the form + of a library.

    +
  • +
  • +

    Support for Apache Mesos, and resource-dynamic YARN support: + support for both Mesos and YARN, including dynamic allocation and + release of resource for more resource elasticity (for both batch + and stream processing).

    +
  • +
  • +

    Security: encrypt both the messages exchanged between + TaskManagers and JobManager, as well as the connections for data + exchange between workers.

    +
  • +
  • +

    More streaming connectors, more runtime metrics, and continuous + DataStream API enhancements: add support for more sources and + sinks (e.g., Amazon Kinesis, Cassandra, Flume, etc), expose more + metrics to the user, and provide continuous improvements to the + DataStream API.

    +
  • +
+ +

If you are interested in these features, we highly encourage you to +take a look at the current +draft, +and join the +discussion +on the Flink mailing lists.

+ + +
+
+ +
+
+ +
+
+
+
+
+ +
+ +
+ +
+
+ + + + + + + + + + + http://git-wip-us.apache.org/repos/asf/flink-web/blob/9ec0a879/content/news/2016/02/11/release-0.10.2.html ---------------------------------------------------------------------- diff --git a/content/news/2016/02/11/release-0.10.2.html b/content/news/2016/02/11/release-0.10.2.html new file mode 100644 index 0000000..83299bd --- /dev/null +++ b/content/news/2016/02/11/release-0.10.2.html @@ -0,0 +1,229 @@ + + + + + + + + Apache Flink: Flink 0.10.2 Released + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + + +
+
+
+
+

Flink 0.10.2 Released

+ +
+

11 Feb 2016

+ +

Today, the Flink community released Flink version 0.10.2, the second bugfix release of the 0.10 series.

+ +

We recommend all users updating to this release by bumping the version of your Flink dependencies to 0.10.2 and updating the binaries on the server.

+ +

Issues fixed

+ +
    +
  • FLINK-3242: Adjust StateBackendITCase for 0.10 signatures of state backends
  • +
  • FLINK-3236: Flink user code classloader as parent classloader from Flink core classes
  • +
  • FLINK-2962: Cluster startup script refers to unused variable
  • +
  • FLINK-3151: Downgrade to Netty version 4.0.27.Final
  • +
  • FLINK-3224: Call setInputType() on output formats that implement InputTypeConfigurable
  • +
  • FLINK-3218: Fix overriding of user parameters when merging Hadoop configurations
  • +
  • FLINK-3189: Fix argument parsing of CLI client INFO action
  • +
  • FLINK-3176: Improve documentation for window apply
  • +
  • FLINK-3185: Log error on failure during recovery
  • +
  • FLINK-3185: Don’t swallow test failure Exception
  • +
  • FLINK-3147: Expose HadoopOutputFormatBase fields as protected
  • +
  • FLINK-3145: Pin Kryo version of transitive dependencies
  • +
  • FLINK-3143: Update Closure Cleaner’s ASM references to ASM5
  • +
  • FLINK-3136: Fix shaded imports in ClosureCleaner.scala
  • +
  • FLINK-3108: JoinOperator’s with() calls the wrong TypeExtractor method
  • +
  • FLINK-3125: Web server starts also when JobManager log files cannot be accessed.
  • +
  • FLINK-3080: Relax restrictions of DataStream.union()
  • +
  • FLINK-3081: Properly stop periodic Kafka committer
  • +
  • FLINK-3082: Fixed confusing error about an interface that no longer exists
  • +
  • FLINK-3067: Enforce zkclient 0.7 for Kafka
  • +
  • FLINK-3020: Set number of task slots to maximum parallelism in local execution
  • +
+ +
+
+ +
+
+ +
+
+
+
+
+ +
+ +
+ +
+
+ + + + + + + + + + +