Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 77068200CBD for ; Thu, 22 Jun 2017 07:22:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 75467160BF0; Thu, 22 Jun 2017 05:22:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B4BFC160BD5 for ; Thu, 22 Jun 2017 07:22:07 +0200 (CEST) Received: (qmail 72956 invoked by uid 500); 22 Jun 2017 05:22:06 -0000 Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@kafka.apache.org Delivered-To: mailing list jira@kafka.apache.org Received: (qmail 72941 invoked by uid 99); 22 Jun 2017 05:22:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jun 2017 05:22:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2D09CC33CE for ; Thu, 22 Jun 2017 05:22:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.21 X-Spam-Level: X-Spam-Status: No, score=-99.21 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id m0TxCDe1Y8k5 for ; Thu, 22 Jun 2017 05:22:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 693C65F6C3 for ; Thu, 22 Jun 2017 05:22:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A1324E0634 for ; Thu, 22 Jun 2017 05:22:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 7262121941 for ; Thu, 22 Jun 2017 05:22:02 +0000 (UTC) Date: Thu, 22 Jun 2017 05:22:02 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: jira@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (KAFKA-5490) Deletion of tombstones during cleaning should consider idempotent message retention MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 22 Jun 2017 05:22:08 -0000 [ https://issues.apache.org/jira/browse/KAFKA-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058778#comment-16058778 ] ASF GitHub Bot commented on KAFKA-5490: --------------------------------------- GitHub user hachikuji opened a pull request: https://github.com/apache/kafka/pull/3406 KAFKA-5490: Retain empty batch for last sequence of each producer You can merge this pull request into a Git repository by running: $ git pull https://github.com/hachikuji/kafka KAFKA-5490 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3406.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3406 ---- commit cf27cc1d69de90c513d96895ec2f557a49b2b3b6 Author: Jason Gustafson Date: 2017-06-21T23:55:36Z KAFKA-5490: Retain empty batch for last sequence of each producer ---- > Deletion of tombstones during cleaning should consider idempotent message retention > ----------------------------------------------------------------------------------- > > Key: KAFKA-5490 > URL: https://issues.apache.org/jira/browse/KAFKA-5490 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Critical > Fix For: 0.11.0.1 > > > The LogCleaner always preserves the message containing last sequence from a given ProducerId when doing a round of cleaning. This is necessary to ensure that the producer is not prematurely evicted which would cause an OutOfOrderSequenceException. The problem with this approach is that the preserved message won't be considered again for cleaning until a new message with the same key is written to the topic. Generally this could result in accumulation of stale entries in the log, but the bigger problem is that the newer entry with the same key could be a tombstone. If we end up deleting this tombstone before a new record with the same key is written, then the old entry will resurface. For example, suppose the following sequence of writes: > 1. ProducerId=1, Key=A, Value=1 > 2. ProducerId=2, Key=A, Value=null (tombstone) > We will preserve the first entry indefinitely until a new record with Key=A is written AND either ProducerId 1 has written a newer record with a larger sequence number or ProducerId 1 becomes expired. As long as the tombstone is preserved, there is no correctness violation: a consumer reading from the beginning will ignore the first entry after reading the tombstone. But it is possible that the tombstone entry will be removed from the log before a new record with Key=A is written. If that happens, then a consumer reading from the beginning would incorrectly observe the overwritten value. -- This message was sent by Atlassian JIRA (v6.4.14#64029)