kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neha Narkhede (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-631) Implement log compaction
Date Tue, 29 Jan 2013 01:18:11 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564944#comment-13564944
] 

Neha Narkhede commented on KAFKA-631:
-------------------------------------

+1 on v9. Some minor changes before you check it in -

13. KafkaConfig
Typo - accross -> across
14. LogCleaner
Typo: ellapsed -> elapsed
15. We talked about this offline, but regarding review comment 6.3, I personally like the
renaming the .swap file to contain the names of the files it has cleaned, but there might
be nuances. e.g. there is a OS limit to the length of a file name. Would you mind filing another
bug to track that change ?
                
> Implement log compaction
> ------------------------
>
>                 Key: KAFKA-631
>                 URL: https://issues.apache.org/jira/browse/KAFKA-631
>             Project: Kafka
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 0.8.1
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>         Attachments: KAFKA-631-v1.patch, KAFKA-631-v2.patch, KAFKA-631-v3.patch, KAFKA-631-v4.patch,
KAFKA-631-v5.patch, KAFKA-631-v6.patch, KAFKA-631-v7.patch, KAFKA-631-v8.patch, KAFKA-631-v9.patch
>
>
> Currently Kafka has only one way to bound the space of the log, namely by deleting old
segments. The policy that controls which segments are deleted can be configured based either
on the number of bytes to retain or the age of the messages. This makes sense for event or
log data which has no notion of primary key. However lots of data has a primary key and consists
of updates by primary key. For this data it would be nice to be able to ensure that the log
contained at least the last version of every key.
> As an example, say that the Kafka topic contains a sequence of User Account messages,
each capturing the current state of a given user account. Rather than simply discarding old
segments, since the set of user accounts is finite, it might make more sense to delete individual
records that have been made obsolete by a more recent update for the same key. This would
ensure that the topic contained at least the current state of each record.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message