Return-Path: X-Original-To: apmail-kafka-dev-archive@www.apache.org Delivered-To: apmail-kafka-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF04018488 for ; Mon, 3 Aug 2015 02:37:04 +0000 (UTC) Received: (qmail 79730 invoked by uid 500); 3 Aug 2015 02:37:04 -0000 Delivered-To: apmail-kafka-dev-archive@kafka.apache.org Received: (qmail 79606 invoked by uid 500); 3 Aug 2015 02:37:04 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 79590 invoked by uid 99); 3 Aug 2015 02:37:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Aug 2015 02:37:04 +0000 Date: Mon, 3 Aug 2015 02:37:04 +0000 (UTC) From: "Jiangjie Qin (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (KAFKA-2334) Prevent HW from going back during leader failover MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/KAFKA-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiangjie Qin updated KAFKA-2334: -------------------------------- Fix Version/s: (was: 0.9.0) 0.8.3 Status: Patch Available (was: In Progress) > Prevent HW from going back during leader failover > -------------------------------------------------- > > Key: KAFKA-2334 > URL: https://issues.apache.org/jira/browse/KAFKA-2334 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Assignee: Jiangjie Qin > Fix For: 0.8.3 > > > Consider the following scenario: > 0. Kafka use replication factor of 2, with broker B1 as the leader, and B2 as the follower. > 1. A producer keep sending to Kafka with ack=-1. > 2. A consumer repeat issuing ListOffset request to Kafka. > And the following sequence: > 0. B1 current log-end-offset (LEO) 0, HW-offset 0; and same with B2. > 1. B1 receive a ProduceRequest of 100 messages, append to local log (LEO becomes 100) and hold the request in purgatory. > 2. B1 receive a FetchRequest starting at offset 0 from follower B2, and returns the 100 messages. > 3. B2 append its received message to local log (LEO becomes 100). > 4. B1 receive another FetchRequest starting at offset 100 from B2, knowing that B2's LEO has caught up to 100, and hence update its own HW, and satisfying the ProduceRequest in purgatory, and sending the FetchResponse with HW 100 back to B2 ASYNCHRONOUSLY. > 5. B1 successfully sends the ProduceResponse to the producer, and then fails, hence the FetchResponse did not reach B2, whose HW remains 0. > From the consumer's point of view, it could first see the latest offset of 100 (from B1), and then see the latest offset of 0 (from B2), and then the latest offset gradually catch up to 100. > This is because we use HW to guard the ListOffset and Fetch-from-ordinary-consumer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)