Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 100FC200C67 for ; Mon, 17 Apr 2017 06:56:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0E9EA160B9D; Mon, 17 Apr 2017 04:56:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 30EE0160B9A for ; Mon, 17 Apr 2017 06:56:09 +0200 (CEST) Received: (qmail 65369 invoked by uid 500); 17 Apr 2017 04:56:08 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 65360 invoked by uid 99); 17 Apr 2017 04:56:08 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Apr 2017 04:56:08 +0000 Received: from Tzu-Lis-MBP.mail (118-160-79-213.dynamic-ip.hinet.net [118.160.79.213]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 744D31A00E8 for ; Mon, 17 Apr 2017 04:56:07 +0000 (UTC) Date: Mon, 17 Apr 2017 12:56:05 +0800 From: "Tzu-Li (Gordon) Tai" To: user@flink.apache.org Message-ID: In-Reply-To: <1492401828519-12627.post@n4.nabble.com> References: <1492401828519-12627.post@n4.nabble.com> Subject: Re: Data duplication on a High Availability activated cluster after a Task Manager failure recovery X-Mailer: Airmail (420) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="58f44ae5_f7685ea_15ec" archived-at: Mon, 17 Apr 2017 04:56:10 -0000 --58f44ae5_f7685ea_15ec Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi, A few things to clarify first: 1. What is the sink you are using=3F Checkpointing in =46link allows for = exactly-once state updates. Whether or not end-to-end exactly-once delive= ry can be achieved depends on the sink. =46or data store sinks such as Ca= ssandra / Elasticsearch, this can be made effectively exactly-once using = idempotent writes (depending on the application logic). =46or a Kafka top= ic as a sink, currently the delivery is only at-least-once. You can check= out =5B1=5D for an overview. 2. Also note that if there essentially is already duplicates in the consu= med Kafka topic (which may occur since Kafka producing does not support a= ny kind of transactions at the moment), then they will all be consumed an= d processed by =46link. However, this does not explain missing data, as this should not happen. So for this, yes, I would try to check if there=E2=80=99s an issue with t= he application logic or the events simply were not in the consumed Kafka = topic in the first place. Cheers, Gordon =5B1=5D=C2=A0https://ci.apache.org/projects/flink/flink-docs-release-1.2/= dev/connectors/guarantees.html On 17 April 2017 at 12:14:00 PM, =46.Amara (fathima=40wso2.com) wrote: Hi all, =20 I'm using =46link 1.2.0. I have a distributed system where =46link High =20 Availability feature is activated. Data is produced using a Kafka broker = and =20 on a TM failure scenario, the cluster restarts. Checkpointing is enabled = =20 with exactly once processing. =20 Problem encountered is, at the end of data processing I receive duplicate= d =20 data and some data are also missing. (ex: if 2000 events are sent it lose= s =20 around 800 events and some events are duplicated at the receiving end). =20 Is this an issue with the =46link version or would it be an issue from my= =20 program logic=3F =20 -- =20 View this message in context: http://apache-flink-user-mailing-list-archi= ve.2336050.n4.nabble.com/Data-duplication-on-a-High-Availability-activate= d-cluster-after-a-Task-Manager-failure-recovery-tp12627.html =20 Sent from the Apache =46link User Mailing List archive. mailing list arch= ive at Nabble.com. =20 --58f44ae5_f7685ea_15ec Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline