flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth <srikanth...@gmail.com>
Subject Re: Barriers at work
Date Fri, 13 May 2016 17:51:49 GMT
Thanks. I didn't know we could set that.

On Fri, May 13, 2016 at 12:44 PM, Stephan Ewen <sewen@apache.org> wrote:

> You can use the checkpoint mode to "at least once".
> That way, barriers never block.
>
> On Fri, May 13, 2016 at 6:05 PM, Srikanth <srikanth.ht@gmail.com> wrote:
>
>> I have a follow up. Is there a recommendation of list of knobs that can
>> be tuned if at least once guarantee while handling failure is good enough?
>> For cases like alert generation, non idempotent sink, etc where the
>> system can live with duplicates or has other mechanism to handle them.
>>
>> Srikanth
>>
>> On Fri, May 13, 2016 at 8:09 AM, Stephan Ewen <sewen@apache.org> wrote:
>>
>>> Hi Srikanth!
>>>
>>> That is an interesting idea.
>>> I have it on my mind to create a design doc for checkpointing
>>> improvements. That could be added as a proposal there.
>>>
>>> I hope I'll be able to start with that design doc next week.
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Fri, May 13, 2016 at 1:35 PM, Matthias J. Sax <mjsax@apache.org>
>>> wrote:
>>>
>>>> I don't think barries can "expire" as of now. Might be a nice idea
>>>> thought -- I don't know if this might be a problem in production.
>>>>
>>>> Furthermore, I want to point out, that an "expiring checkpoint" would
>>>> not break exactly-once processing, as the latest successful checkpoint
>>>> can always be used to recover correctly. Only the recovery-time would be
>>>> increase. because if a "barrier expires" and no checkpoint can be
>>>> stored, more data has to be replayed using the "old" checkpoint".
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 05/12/2016 09:21 PM, Srikanth wrote:
>>>> > Hello,
>>>> >
>>>> > I was reading about Flink's checkpoint and wanted to check if I
>>>> > correctly understood the usage of barriers for exactly once
>>>> processing.
>>>> >  1) Operator does alignment by buffering records coming after a
>>>> barrier
>>>> > until it receives barrier from all upstream operators instances.
>>>> >  2) Barrier is always preceded by a watermark to trigger processing
>>>> all
>>>> > windows that are complete.
>>>> >  3) Records in windows that are not triggered are also saved as part
>>>> of
>>>> > checkpoint. These windows are repopulated when restoring from
>>>> checkpoints.
>>>> >
>>>> > In production setups, were there any cases where alignment during
>>>> > checkpointing caused unacceptable latency?
>>>> > If so, is there a way to indicate say wait for a MAX 100 ms? That way
>>>> we
>>>> > have exactly-once in most situations but prefer at least once over
>>>> > higher latency in corner cases.
>>>> >
>>>> > Srikanth
>>>>
>>>>
>>>
>>
>

Mime
View raw message