kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5490) Deletion of tombstones during cleaning should consider idempotent message retention
Date Thu, 22 Jun 2017 05:22:02 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058778#comment-16058778
] 

ASF GitHub Bot commented on KAFKA-5490:
---------------------------------------

GitHub user hachikuji opened a pull request:

    https://github.com/apache/kafka/pull/3406

    KAFKA-5490: Retain empty batch for last sequence of each producer

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hachikuji/kafka KAFKA-5490

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/3406.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3406
    
----
commit cf27cc1d69de90c513d96895ec2f557a49b2b3b6
Author: Jason Gustafson <jason@confluent.io>
Date:   2017-06-21T23:55:36Z

    KAFKA-5490: Retain empty batch for last sequence of each producer

----


> Deletion of tombstones during cleaning should consider idempotent message retention
> -----------------------------------------------------------------------------------
>
>                 Key: KAFKA-5490
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5490
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: clients, core, producer 
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Critical
>             Fix For: 0.11.0.1
>
>
> The LogCleaner always preserves the message containing last sequence from a given ProducerId
when doing a round of cleaning. This is necessary to ensure that the producer is not prematurely
evicted which would cause an OutOfOrderSequenceException. The problem with this approach is
that the preserved message won't be considered again for cleaning until a new message with
the same key is written to the topic. Generally this could result in accumulation of stale
entries in the log, but the bigger problem is that the newer entry with the same key could
be a tombstone. If we end up deleting this tombstone before a new record with the same key
is written, then the old entry will resurface. For example, suppose the following sequence
of writes:
> 1. ProducerId=1, Key=A, Value=1
> 2. ProducerId=2, Key=A, Value=null (tombstone)
> We will preserve the first entry indefinitely until a new record with Key=A is written
AND either ProducerId 1 has written a newer record with a larger sequence number or ProducerId
1 becomes expired. As long as the tombstone is preserved, there is no correctness violation:
a consumer reading from the beginning will ignore the first entry after reading the tombstone.
But it is possible that the tombstone entry will be removed from the log before a new record
with Key=A is written. If that happens, then a consumer reading from the beginning would incorrectly
observe the overwritten value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message