flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nico Kruber <n...@data-artisans.com>
Subject Re: Duplicated data when using Externalized Checkpoints in a Flink Highly Available cluster
Date Thu, 08 Jun 2017 09:39:18 GMT
Hi Amara,
please refer to [1] for some details about our checkpointing mechanism, in 
short, for your situation:

* checkpoints are made at certain checkpoint barriers,
* in between those barriers, processing continues and so do outputs
* in case of a failure the state at the latest checkpoint is restored
* then the processing re-starts from there and you will see the same outputs 
again

You seem to not deliver to Kafka but only consume from it and write to a csv 
file. If this operation was transactional, you would commit at each checkpoint 
barrier only and never see the "duplicate", i.e. uncommitted events.

Regards,
Nico

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/
stream_checkpointing.html

On Monday, 5 June 2017 08:55:05 CEST F.Amara wrote:
> Hi Robert,
> 
> I have few more questions to clarify.
> 
> 1) Why do you say printing the values to the standard out would display
> duplicates even if exactly once works? What is the reason for this? Could
> you brief me on this?
> 
> 2) I observed duplicates (by writing to a file) starting from the
> FlinkKafkaConsumer onwards. Why does this component introduce duplicates? Is
> it because Kafka guarantees only At-least once delivery at the moment?
> 
> Thanks,
> Amara
> 
> 
> 
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Duplica
> ted-data-when-using-Externalized-Checkpoints-in-a-Flink-Highly-Available-clu
> ster-tp13301p13483.html Sent from the Apache Flink User Mailing List
> archive. mailing list archive at Nabble.com.


Mime
View raw message