kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Rao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4545) tombstone needs to be removed after delete.retention.ms has passed after it has been cleaned
Date Thu, 15 Dec 2016 02:28:58 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750154#comment-15750154
] 

Jun Rao commented on KAFKA-4545:
--------------------------------

One potential way to fix this is when cleaning a segment after the dirty marker, we don't
inherit the last modified time of the original segment. If we clean a segment before the dirty
marker, we inherit the last modified time. This way, the last modified time of a cleaned segment
is the time when it first gets cleaned. Not sure if this completely address this issue though.

> tombstone needs to be removed after delete.retention.ms has passed after it has been
cleaned
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4545
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4545
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.0
>            Reporter: Jun Rao
>
> The algorithm for removing the tombstone in a compacted is supposed to be the following.
> 1. Tombstone is never removed when it's still in the dirty portion of the log.
> 2. After the tombstone is in the cleaned portion of the log, we further delay the removal
of the tombstone by delete.retention.ms since the time the tombstone is in the cleaned portion.
> Once the tombstone is in the cleaned portion, we know there can't be any message with
the same key before the tombstone. Therefore, for any consumer, if it reads a non-tombstone
message before the tombstone, but can read to the end of the log within delete.retention.ms,
it's guaranteed to see the tombstone.
> However, the current implementation doesn't seem correct. We delay the removal of the
tombstone by delete.retention.ms since the last modified time of the last cleaned segment.
However, the last modified time is inherited from the original segment, which could be arbitrarily
old. So, the tombstone may not be preserved as long as it needs to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message