flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Barriers at work
Date Fri, 13 May 2016 16:44:40 GMT
You can use the checkpoint mode to "at least once".
That way, barriers never block.

On Fri, May 13, 2016 at 6:05 PM, Srikanth <srikanth.ht@gmail.com> wrote:

> I have a follow up. Is there a recommendation of list of knobs that can be
> tuned if at least once guarantee while handling failure is good enough?
> For cases like alert generation, non idempotent sink, etc where the system
> can live with duplicates or has other mechanism to handle them.
>
> Srikanth
>
> On Fri, May 13, 2016 at 8:09 AM, Stephan Ewen <sewen@apache.org> wrote:
>
>> Hi Srikanth!
>>
>> That is an interesting idea.
>> I have it on my mind to create a design doc for checkpointing
>> improvements. That could be added as a proposal there.
>>
>> I hope I'll be able to start with that design doc next week.
>>
>> Greetings,
>> Stephan
>>
>>
>> On Fri, May 13, 2016 at 1:35 PM, Matthias J. Sax <mjsax@apache.org>
>> wrote:
>>
>>> I don't think barries can "expire" as of now. Might be a nice idea
>>> thought -- I don't know if this might be a problem in production.
>>>
>>> Furthermore, I want to point out, that an "expiring checkpoint" would
>>> not break exactly-once processing, as the latest successful checkpoint
>>> can always be used to recover correctly. Only the recovery-time would be
>>> increase. because if a "barrier expires" and no checkpoint can be
>>> stored, more data has to be replayed using the "old" checkpoint".
>>>
>>>
>>> -Matthias
>>>
>>> On 05/12/2016 09:21 PM, Srikanth wrote:
>>> > Hello,
>>> >
>>> > I was reading about Flink's checkpoint and wanted to check if I
>>> > correctly understood the usage of barriers for exactly once processing.
>>> >  1) Operator does alignment by buffering records coming after a barrier
>>> > until it receives barrier from all upstream operators instances.
>>> >  2) Barrier is always preceded by a watermark to trigger processing all
>>> > windows that are complete.
>>> >  3) Records in windows that are not triggered are also saved as part of
>>> > checkpoint. These windows are repopulated when restoring from
>>> checkpoints.
>>> >
>>> > In production setups, were there any cases where alignment during
>>> > checkpointing caused unacceptable latency?
>>> > If so, is there a way to indicate say wait for a MAX 100 ms? That way
>>> we
>>> > have exactly-once in most situations but prefer at least once over
>>> > higher latency in corner cases.
>>> >
>>> > Srikanth
>>>
>>>
>>
>

Mime
View raw message