flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: High back-pressure after recovering from a save point
Date Fri, 14 Jul 2017 16:33:35 GMT
Can you try starting from the savepoint, but telling Kafka to start from
the latest offset?

(@gordon: Is that possible in Flink 1.3.1 or only in 1.4-SNAPSHOT ?)

On Fri, Jul 14, 2017 at 11:18 AM, Kien Truong <duckientruong@gmail.com>
wrote:

> Hi,
>
> Sorry for the version typo, I'm running 1.3.1. I did not test with 1.2.x.
>
> The jobs runs fine with almost 0 back-pressure if it's started from
> scratch or if I reuse the kafka consumers group id without specifying the
> safe point.
>
> Best regards,
> Kien
> On Jul 14, 2017, at 15:59, Stephan Ewen <sewen@apache.org> wrote:
>>
>> Hi!
>>
>> Flink 1.3.2 does not yet exist. Do you mean 1.3.1 or latest master?
>>
>> Can you tell us whether this occurs only in 1.3.x and worked well in
>> 1.2.x?
>> If you just keep the job running without savepoint/restore, you do not
>> get into backpressure situations?
>>
>> Thanks,
>> Stephan
>>
>>
>> On Fri, Jul 14, 2017 at 1:15 AM, Kien Truong <duckientruong@gmail.com>
>> wrote:
>>
>>> Hi Fabian,
>>> This happens to me even when the restore is immediate, so there's not
>>> much data in Kafka to catch up (5 minutes max)
>>>
>>> Regards
>>> Kien
>>> On Jul 13, 2017, at 23:40, Fabian Hueske < fhueske@gmail.com> wrote:
>>>>
>>>> I would guess that this is quite usual because the job has to
>>>> "catch-up" work.
>>>> For example, if you took a save point two days ago and restore the job
>>>> today, the input data of the last two days has been written to Kafka
>>>> (assuming Kafka as source) and needs to be processed.
>>>> The job will now read as fast as possible from Kafka to catch-up to the
>>>> presence. This means the data is much fast ingested (as fast as Kafka can
>>>> read and ship it) than during regular processing (as fast as your sources
>>>> produce).
>>>> The processing speed is bound by your Flink job which means there will
>>>> be backpressure.
>>>>
>>>> Once the job caught-up, the backpressure should disappear.
>>>>
>>>> Best, Fabian
>>>>
>>>> 2017-07-13 15:48 GMT+02:00 Kien Truong <duckientruong@gmail.com>:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have one job where back-pressure  is significantly higher after
>>>>> resuming from a save point.
>>>>>
>>>>> Because that job makes heavy use of stateful functions with
>>>>> RocksDBStateBackend ,
>>>>>
>>>>> I'm suspecting that this is the cause of performance degradation.
>>>>>
>>>>> Does anyone encounter simillar issues or have any tips for debugging
?
>>>>>
>>>>>
>>>>> I'm using Flink 1.3.2 with YARN in detached mode.
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Kien
>>>>>
>>>>>
>>>>
>>

Mime
View raw message