flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Back Pressure details
Date Wed, 06 Apr 2016 10:35:40 GMT
Hey Zach,

just added some documentation, which will be available in ~ 30 mins
here: https://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/back_pressure_monitoring.html

If you think that something is missing there, I would appreciate some
feedback. :-)

Back pressure is determined by repeatedly calling getStackTrace() on
the task Threads executing the job. By default, 100 times with 50ms
delay between calls. If the task thread is stuck in an internal method
call requesting buffers from the network stack, this indicates back
pressure.

The ratio you see tells you how many of the stack traces were stuck in
that method (e.g. 1 out of 100) and the status codes just group those
in a (hopefully) reasonable way (<= 0.10 is OK, <= 0.5 is LOW, > 0.5
is HIGH).

If you have a task with back pressure this means that it is producing
data faster than the network can consume, for example because the
downstream operator is slow or the network can't handle it. Your
Source => A => B => Sink example suggests that the sink is slowing
down/back pressuring B, which is in turn slowing down/back pressuring
A.

Does this help?

Keep in mind though that it is not a rock solid approach and there is
a chance that we miss the back pressure indicators or always sample
when we the task is requesting buffers (which is happening all the
time). It often works better at the extremes, e.g. when there is no
back pressure at all or very high back pressure.

– Ufuk


On Tue, Apr 5, 2016 at 10:47 PM, Zach Cox <zcox522@gmail.com> wrote:
> Hi - I'm trying to identify bottlenecks in my Flink streaming job, and am
> curious about the Back Pressure view in the job manager web UI. If there are
> already docs for Back Pressure please feel free to just point me to those.
> :)
>
> When "Sampling in progress..." is displayed, what exactly is happening?
>
> What do the values in the Ratio column for each Subtask mean exactly?
>
> What does Status such as OK, High, etc mean? Are these determined from the
> Ratio values?
>
> If my job graph looks like Source => A => B => Sink, with Back Pressure OK
> for Source and Sink, but High for A and B, what does that suggest?
>
> Thanks,
> Zach
>

Mime
View raw message