kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Crowley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-3806) Adjust default values of log.retention.hours and offsets.retention.minutes
Date Wed, 15 Nov 2017 16:30:02 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253743#comment-16253743
] 

John Crowley commented on KAFKA-3806:
-------------------------------------

Thanks for quick response. Yes, I did find the 4682 entry & KIP and also posted there.
It does look like they are trying to clean up a lot of the consumer-side issues.

And there are workarounds proposed, just thought it cleaner overall if you could just set
offsets.retention.minutes on the groupId once, and not need to worry that every piece of code
touching this groupId has to be checked that it is passing the correct expiration. Also would
make it clear to any Kafka DevOps that this groupId is being handled specially.

In a general case, would be nice if every broker property could be overwritten on a per topic
or per groupId basis - if it could reasonably apply to a specific topic or groupId. E.g. offsets.load.buffer.size
deals only with how the broker is operating, but retention.ms can logically differ between
different topics, and offsets.retention.minutes might well differ for different groupIds -
based on the use-case which applies to a particular topic or group.

> Adjust default values of log.retention.hours and offsets.retention.minutes
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-3806
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3806
>             Project: Kafka
>          Issue Type: Improvement
>          Components: config
>    Affects Versions: 0.9.0.1, 0.10.0.0
>            Reporter: Michal Turek
>            Priority: Minor
>             Fix For: 1.1.0
>
>
> Combination of default values of log.retention.hours (168 hours = 7 days) and offsets.retention.minutes
(1440 minutes = 1 day) may be dangerous in special cases. Offset retention should be always
greater than log retention.
> We have observed the following scenario and issue:
> - Producing of data to a topic was disabled two days ago by producer update, topic wasn't
deleted.
> - Consumer consumed all data and properly committed offsets to Kafka.
> - Consumer made no more offset commits for that topic because there was no more incoming
data and there was nothing to confirm. (We have auto-commit disabled, I'm not sure how behaves
enabled auto-commit.)
> - After one day: Kafka cleared too old offsets according to offsets.retention.minutes.
> - After two days: Long-term running consumer was restarted after update, it didn't find
any committed offsets for that topic since they were deleted by offsets.retention.minutes
so it started consuming from the beginning.
> - The messages were still in Kafka due to larger log.retention.hours, about 5 days of
messages were read again.
> Known workaround to solve this issue:
> - Explicitly configure log.retention.hours and offsets.retention.minutes, don't use defaults.
> Proposals:
> - Prolong default value of offsets.retention.minutes to be at least twice larger than
log.retention.hours.
> - Check these values during Kafka startup and log a warning if offsets.retention.minutes
is smaller than log.retention.hours.
> - Add a note to migration guide about differences between storing of offsets in ZooKeeper
and Kafka (http://kafka.apache.org/documentation.html#upgrade).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message