flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paris Carbone <par...@kth.se>
Subject Re: How to ensure exactly-once semantics in output to Kafka?
Date Fri, 05 Feb 2016 12:33:59 GMT
That would be good indeed. I just learned about it from Stephan mentioned. It sounds correct
to me along the lines but it would be nice to see the details.

> On 05 Feb 2016, at 13:32, Ufuk Celebi <uce@apache.org> wrote:
> 
> 
>> On 05 Feb 2016, at 13:28, Paris Carbone <parisc@kth.se> wrote:
>> 
>> Hi Gabor,
>> 
>> The sinks should aware that the global checkpoint is indeed persisted before emitting
so they will have to wait until they are notified for its completion before pushing to Kafka.
The current view of the local state is not the actual persisted view so checking against is
like relying on dirty state. Imagine the following scenario:
>> 
>> 1) sink pushes to kafka record k and updates local buffer for k
>> 2) sink snapshots k and the rest of its state on checkpoint barrier
>> 3) global checkpoint fails due to some reason (e.g. another sink subtask failed)
and the job gets restarted
>> 4) sink pushes again record k to kafka since the last global snapshots did not complete
before and k is not in the local buffer
>> 
>> Chesnay’s approach seems to be valid and best effort for the time being.
> 
> Chesnay’s approach is not part of this thread. Can you or Chesnay elaborate/provide
a link?
> 
> – Ufuk
> 

Mime
View raw message