flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijiang" <wangzhijiang...@aliyun.com>
Subject Re: Flink Streaming Job Tuning help
Date Wed, 13 May 2020 04:10:42 GMT
Hi Kumar,

I can give some general ideas for further analysis. 

> We are finding that flink lags seriously behind when we introduce the keyBy (presumably
because of shuffle across the network)
The `keyBy` would break the chained operators, so it might bring obvious performance sensitive
in practice. I guess if your previous way without keyBy can make use of chained mechanism,

the follow-up operator can consume the emitted records from the preceding operator directly,
no need to involve in buffer serialization-> network shuffle -> buffer deserializer
processes,
especially your record size 10K is a bit large.

If the keyBy is necessary in your case, then you can further check the current bottleneck.
E.g. whether there are back pressure which you can monitor from web UI. If so, which task
is the
bottleneck to cause the back pressure, and you can trace it by network related metrics. 

Whether there are data skew in your case, that means some task would process more records
than others. If so, maybe we can increase the parallelism to balance the load.

Best,
Zhijiang
------------------------------------------------------------------
From:Senthil Kumar <senthilku@vmware.com>
Send Time:2020年5月13日(星期三) 00:49
To:user@flink.apache.org <user@flink.apache.org>
Subject:Re: Flink Streaming Job Tuning help

I forgot to mention, we are consuming said records from AWS kinesis and writing out to S3.

From: Senthil Kumar <senthilku@vmware.com>
Date: Tuesday, May 12, 2020 at 10:47 AM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Flink Streaming Job Tuning help

Hello Flink Community!

We have a fairly intensive flink streaming application, processing 8-9 million records a minute,
with each record being 10k.
One of our steps is a keyBy operation. We are finding that flink lags seriously behind when
we introduce the keyBy (presumably because of shuffle across the network).

We are trying to tune it ourselves (size of nodes, memory, network buffers etc), but before
we spend way too much time on
this; would it be better to hire some “flink tuning expert” to get us through?

If so what resources are recommended on this list?

Cheers
Kumar

Mime
View raw message