kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vahid Hashemian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4682) Committed offsets should not be deleted if a consumer is still active
Date Thu, 12 Oct 2017 23:00:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202763#comment-16202763
] 

Vahid Hashemian commented on KAFKA-4682:
----------------------------------------

[~hachikuji] I have started drafting a KIP for the changes discussed here. Could you please
clarify what you mean by
{quote}... we could probably also remove the commit timestamp and use the timestamp from the
message itself. ...{quote}
I see that the commit timestamp is set to the time the request is processed (which supposedly
is when the offset is committed). So I'm not clear what you mean by "timestamp from the message
itself".
Thanks.

> Committed offsets should not be deleted if a consumer is still active
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-4682
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4682
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: James Cheng
>
> Kafka will delete committed offsets that are older than offsets.retention.minutes
> If there is an active consumer on a low traffic partition, it is possible that Kafka
will delete the committed offset for that consumer. Once the offset is deleted, a restart
or a rebalance of that consumer will cause the consumer to not find any committed offset and
start consuming from earliest/latest (depending on auto.offset.reset). I'm not sure, but a
broker failover might also cause you to start reading from auto.offset.reset (due to broker
restart, or coordinator failover).
> I think that Kafka should only delete offsets for inactive consumers. The timer should
only start after a consumer group goes inactive. For example, if a consumer group goes inactive,
then after 1 week, delete the offsets for that consumer group. This is a solution that [~junrao]
mentioned in https://issues.apache.org/jira/browse/KAFKA-3806?focusedCommentId=15323521&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15323521
> The current workarounds are to:
> # Commit an offset on every partition you own on a regular basis, making sure that it
is more frequent than offsets.retention.minutes (a broker-side setting that a consumer might
not be aware of)
> or
> # Turn the value of offsets.retention.minutes up really really high. You have to make
sure it is higher than any valid low-traffic rate that you want to support. For example, if
you want to support a topic where someone produces once a month, you would have to set offsetes.retention.mintues
to 1 month. 
> or
> # Turn on enable.auto.commit (this is essentially #1, but easier to implement).
> None of these are ideal. 
> #1 can be spammy. It requires your consumers know something about how the brokers are
configured. Sometimes it is out of your control. Mirrormaker, for example, only commits offsets
on partitions where it receives data. And it is duplication that you need to put into all
of your consumers.
> #2 has disk-space impact on the broker (in __consumer_offsets) as well as memory-size
on the broker (to answer OffsetFetch).
> #3 I think has the potential for message loss (the consumer might commit on messages
that are not yet fully processed)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message