Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 77818200BA2 for ; Sat, 1 Oct 2016 12:50:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 76301160ACD; Sat, 1 Oct 2016 10:50:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B0D29160AD5 for ; Sat, 1 Oct 2016 12:50:26 +0200 (CEST) Received: (qmail 97558 invoked by uid 500); 1 Oct 2016 10:50:20 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 97537 invoked by uid 99); 1 Oct 2016 10:50:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Oct 2016 10:50:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8F7D22C0032 for ; Sat, 1 Oct 2016 10:50:20 +0000 (UTC) Date: Sat, 1 Oct 2016 10:50:20 +0000 (UTC) From: "Tzu-Li (Gordon) Tai (JIRA)" To: dev@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (FLINK-4723) Unify definition of committed offsets to Kafka / ZK for Kafka 0.8 and 0.9 consumer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 01 Oct 2016 10:50:27 -0000 Tzu-Li (Gordon) Tai created FLINK-4723: ------------------------------------------ Summary: Unify definition of committed offsets to Kafka / ZK for Kafka 0.8 and 0.9 consumer Key: FLINK-4723 URL: https://issues.apache.org/jira/browse/FLINK-4723 Project: Flink Issue Type: Improvement Components: Kafka Connector Reporter: Tzu-Li (Gordon) Tai Assignee: Tzu-Li (Gordon) Tai Fix For: 1.2.0, 1.1.3 The proper "definition" of the offsets committed back to Kafka / ZK should be "the next offset that consumers should read (in Kafka terms, the 'position')". This is already fixed for the 0.9 consumer by FLINK-4618, by incrementing the committed offsets back to Kafka by the 0.9 by 1, so that the internal {{KafkaConsumer}} picks up the correct start position when committed offsets are present. This fix was required because the start position was implicitly caught with Kafka 0.9 APIs However, since the 0.8 consumer handles offset committing and start position using Flink's own {{ZookeeperOffsetHandler}} and not Kafka's high-level APIs, so the 0.8 consumer did not require a fix. I propose to still unify the behaviour of committed offsets across 0.8 and 0.9 to the definition above. Otherwise, if users in any case first uses the 0.8 consumer to read data and have Flink-committed offsets in ZK, and then uses a high-level 0.8 Kafka consumer to read the same topic in a non-Flink application, the first record will be duplicate (because, like described above, Kafka high-level consumers expect the committed offsets to be "the next record to process" and not "the last processed record"). This requires incrementing the committed ZK offsets in 0.8 to also be incremented by 1, and changing how Flink internal offsets are set according to the acquired ZK offsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)