flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [flink-web] dawidwys commented on a change in pull request #436: Add Apache Flink release 1.13.0
Date Mon, 26 Apr 2021 10:25:48 GMT

dawidwys commented on a change in pull request #436:
URL: https://github.com/apache/flink-web/pull/436#discussion_r620168175



##########
File path: _posts/2021-04-22-release-1.13.0.md
##########
@@ -0,0 +1,374 @@
+---
+layout: post 
+title:  "Apache Flink 1.13.0 Release Announcement"
+date: 2021-04-22T08:00:00.000Z 
+categories: news 
+authors:
+- stephan:
+  name: "Stephan Ewen"
+  twitter: "StephanEwen"
+- dwysakowicz:
+  name: "Dawid Wysakowicz"
+  twitter: "dwysakowicz"
+
+excerpt: The Apache Flink community is excited to announce the release of Flink 1.13.0! Close
to xxx contributors worked on over xxx threads to bring significant improvements to usability
and observability as well as new features that improve elasticity of Flink’s Application-style
deployments.
+---
+
+
+The Apache Flink community is excited to announce the release of Flink 1.13.0! Close to xxx
+contributors worked on over xxx threads to bring significant improvements to usability and
+observability as well as new features that improve elasticity of Flink’s Application-style
+deployments.
+
+This release brings us a big step forward in one of our major efforts: Making Stream Processing
+Applications as natural and as simple to manage as any other application. The new reactive
scaling
+mode means that scaling streaming applications in and out now works like in any other application,
+by just changing the number of parallel processes.
+
+We also added a series of improvements that help users better understand the performance
of
+applications. When the streams don't flow as fast as you’d hope, these can help you to
understand
+why: Load and backpressure visualization to identify bottlenecks, CPU flame graphs to identify
hot
+code paths in your application, and State Access Latencies to see how the State Backends
are keeping
+up.
+
+This blog post describes all major new features and improvements, important changes to be
aware of
+and what to expect moving forward.
+
+{% toc %}
+
+We encourage you to [download the release](https://flink.apache.org/downloads.html) and share
your
+feedback with the community through
+the [Flink mailing lists](https://flink.apache.org/community.html#mailing-lists)
+or [JIRA](https://issues.apache.org/jira/projects/FLINK/summary).
+
+## Notable Features and Improvements
+
+### Reactive mode
+
+The Reactive Mode is the latest piece in Flink's initiative for making Stream Processing
+Applications as natural and as simple to manage as any other application.
+
+Flink has a dual nature when it comes to resource management and deployments: You can deploy
+clusters onto Resource Managers like Kubernetes or Yarn in such a way that Flink actively
manages
+the resource, and allocates and releases workers as needed. That is especially useful for
jobs and
+applications that rapidly change their required resources, like batch applications and ad-hoc
SQL
+queries. The application parallelism rules, the number of workers follows. We call this active
+scaling.
+
+For long running streaming applications, it is often a nicer model to just deploy them like
any
+other long-running application: The application doesn't really need to know that it runs
on K8s,
+EKS, Yarn, etc. and doesn't try to acquire a specific amount of workers; instead, it just
uses the
+number of workers that is given to it. The number of workers rules, the application parallelism
+adjusts to that. We call that re-active scaling.
+
+The Application Deployment Mode started this effort, making deployments application-like
(avoiding
+having to separate deployment steps to (1) start cluster and (2) submit application). The
reactive
+scheduler completes this, and you now don't have to use extra tools (scripts or a K8s operator)
any
+more to keep the number of workers and the application parallelism settings in sync.
+
+You can now put an auto-scaler around Flink applications like around other typical applications
— as
+long as you are mindful when configuring the autoscaler that stateful applications still
spend
+effort in moving state around when scaling.
+
+
+### Bottleneck detection, Backpressure and Idleness Monitoring
+
+One of the most important metrics to investigate when a job does not consume records as fast
as you
+would expect is the backpressure ratio. It lets you track down bottlenecks in your pipelines.
The
+current mechanism had two limitations:
+It was heavy, because it worked by repeatedly taking stack trace samples of your running
tasks. It
+was difficult to find out which vertex was the source of backpressure. In Flink 1.13, we
reworked
+the mechanism to include new metrics for the time tasks spend being backpressured, along
with a
+reworked graphical representation of the job (including a percentage of time particular vertices
are
+backpressured).
+
+
+<figure style="align-content: center">
+  <img src="{{ site.baseurl }}/img/blog/2021-04-xx-release-1.13.0/bottleneck.png" style="width:
900px"/>
+</figure>
+
+### Support for CPU flame graphs in Web UI
+
+It is desirable to provide better visibility into the distribution of CPU resources while
executing
+user code. One of the most visually effective means to do that are Flame Graphs. They allow
to
+easily answer question like:
+Which methods are currently consuming CPU resources? How does consumption by one method compare
to
+the others? Which series of calls on the stack led to executing a particular method? Flame
Graphs
+are constructed by sampling stack traces a number of times. Every method call is represented
by a
+bar, where the length of the bar is proportional to the number of times it is present in
the
+samples. In order to prevent unintended impacts on production environments, Flame Graphs
are
+currently available as an opt-in feature that needs to be enabled in the configuration. Once
enabled
+they are accessible via a new component in the UI at the level of the selected operator:
+
+<figure style="align-content: center">
+  <img src="{{ site.baseurl }}/img/blog/2021-04-xx-release-1.13.0/7.png" style="display:
block; margin-left: auto; margin-right: auto; width: 600px"/>
+</figure>
+
+### Access Latency Metrics for State
+
+State interactions are a crucial part of the majority of data
+pipelines. Especially in case of using RocksDB they might be rather IO intensive and therefore
they
+play an important role in the overall performance of the pipeline. Therefore, it is important
to be
+able to get insights into what is going on under the hood. To provide more insights, we exposed
+latency tracking metrics.
+
+The metrics are disabled by default, but you can enable them using the
+`state.backend.rocksdb.latency-track-enabled` option.
+
+### Unified binary savepoint format
+
+All available state backends are forced to produce a single common unified binary format
for their
+savepoints. This means that savepoints are now mutually interchangeable. You are no longer
locked
+into the first state backend you chose when starting your application for the first time.
It makes
+it easier to start with Heap Backend and switch later on to RocksDB, if JVM Heap becomes
too full (
+which you usually see when the GC times start to go up too much).
+
+### Support user-specified pod templates for Active Kubernetes Deployments
+
+The native Kubernetes deployment received an important update that it supports custom pod
templates.
+Flink from now on allows users to define the JobManager and TaskManager pods via template
files.
+This allows to support advanced features that are not supported by Flink Kubernetes config
options
+directly. Major Observability Improvements
+
+What runs on Flink are often critical workloads with SLAs, so it is important to have the
right
+tools to understand what is happening inside the applications.
+
+If your application does not progress as expected, the latency is higher or the throughput
lower
+than you would expect, these features help you figure out what is going on.
+
+### Unaligned Checkpoints - Production Ready

Review comment:
       I'll go with Arvid's proposal. I hope that's fine @pnowojski ;)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message