flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kien Truong <duckientru...@gmail.com>
Subject Re: High back-pressure after recovering from a save point
Date Fri, 14 Jul 2017 09:18:20 GMT
Hi, 

Sorry for the version typo, I'm running 1.3.1. I did not test with 1.2.x.

The jobs runs fine with almost 0 back-pressure if it's started from scratch or if I reuse
the kafka consumers group id without specifying the safe point. 

Best regards, 
Kien 

On Jul 14, 2017, 15:59, at 15:59, Stephan Ewen <sewen@apache.org> wrote:
>Hi!
>
>Flink 1.3.2 does not yet exist. Do you mean 1.3.1 or latest master?
>
>Can you tell us whether this occurs only in 1.3.x and worked well in
>1.2.x?
>If you just keep the job running without savepoint/restore, you do not
>get
>into backpressure situations?
>
>Thanks,
>Stephan
>
>
>On Fri, Jul 14, 2017 at 1:15 AM, Kien Truong <duckientruong@gmail.com>
>wrote:
>
>> Hi Fabian,
>> This happens to me even when the restore is immediate, so there's not
>much
>> data in Kafka to catch up (5 minutes max)
>>
>> Regards
>> Kien
>> On Jul 13, 2017, at 23:40, Fabian Hueske <fhueske@gmail.com> wrote:
>>>
>>> I would guess that this is quite usual because the job has to
>"catch-up"
>>> work.
>>> For example, if you took a save point two days ago and restore the
>job
>>> today, the input data of the last two days has been written to Kafka
>>> (assuming Kafka as source) and needs to be processed.
>>> The job will now read as fast as possible from Kafka to catch-up to
>the
>>> presence. This means the data is much fast ingested (as fast as
>Kafka can
>>> read and ship it) than during regular processing (as fast as your
>sources
>>> produce).
>>> The processing speed is bound by your Flink job which means there
>will be
>>> backpressure.
>>>
>>> Once the job caught-up, the backpressure should disappear.
>>>
>>> Best, Fabian
>>>
>>> 2017-07-13 15:48 GMT+02:00 Kien Truong <duckientruong@gmail.com>:
>>>
>>>> Hi all,
>>>>
>>>> I have one job where back-pressure  is significantly higher after
>>>> resuming from a save point.
>>>>
>>>> Because that job makes heavy use of stateful functions with
>>>> RocksDBStateBackend ,
>>>>
>>>> I'm suspecting that this is the cause of performance degradation.
>>>>
>>>> Does anyone encounter simillar issues or have any tips for
>debugging ?
>>>>
>>>>
>>>> I'm using Flink 1.3.2 with YARN in detached mode.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Kien
>>>>
>>>>
>>>

Mime
View raw message