flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai" <tzuli...@apache.org>
Subject Re: Data duplication on a High Availability activated cluster after a Task Manager failure recovery
Date Mon, 17 Apr 2017 04:56:05 GMT

A few things to clarify first:

1. What is the sink you are using? Checkpointing in Flink allows for exactly-once state updates.
Whether or not end-to-end exactly-once delivery can be achieved depends on the sink. For data
store sinks such as Cassandra / Elasticsearch, this can be made effectively exactly-once using
idempotent writes (depending on the application logic). For a Kafka topic as a sink, currently
the delivery is only at-least-once. You can check out [1] for an overview.

2. Also note that if there essentially is already duplicates in the consumed Kafka topic (which
may occur since Kafka producing does not support any kind of transactions at the moment),
then they will all be consumed and processed by Flink.

However, this does not explain missing data, as this should not happen.
So for this, yes, I would try to check if there’s an issue with the application logic or
the events simply were not in the consumed Kafka topic in the first place.


[1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/guarantees.html

On 17 April 2017 at 12:14:00 PM, F.Amara (fathima@wso2.com) wrote:

Hi all,  

I'm using Flink 1.2.0. I have a distributed system where Flink High  
Availability feature is activated. Data is produced using a Kafka broker and  
on a TM failure scenario, the cluster restarts. Checkpointing is enabled  
with exactly once processing.  
Problem encountered is, at the end of data processing I receive duplicated  
data and some data are also missing. (ex: if 2000 events are sent it loses  
around 800 events and some events are duplicated at the receiving end).  

Is this an issue with the Flink version or would it be an issue from my  
program logic?  

View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-duplication-on-a-High-Availability-activated-cluster-after-a-Task-Manager-failure-recovery-tp12627.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

View raw message