kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guozhang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-2758) Improve Offset Commit Behavior
Date Mon, 06 Nov 2017 17:47:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240608#comment-16240608

Guozhang Wang commented on KAFKA-2758:

[~jjkoshy] That's a good point. The main motivation for 1) is for services like MM, where
a commit request may contains large number of partitions where many of them contains the same
offsets; and the hope is to reduce the request size for such scenarios. I'm wondering if this
is still a good trade-off with complexity to modify the server-side logic handling commit
offset to update the timestamps from this group id (I think that is primarily dependent on
how much we can save in practice for network bandwidth).

> Improve Offset Commit Behavior
> ------------------------------
>                 Key: KAFKA-2758
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2758
>             Project: Kafka
>          Issue Type: Improvement
>          Components: consumer
>            Reporter: Guozhang Wang
>              Labels: newbiee, reliability
> There are two scenarios of offset committing that we can improve:
> 1) we can filter the partitions whose committed offset is equal to the consumed offset,
meaning there is no new consumed messages from this partition and hence we do not need to
include this partition in the commit request.
> 2) we can make a commit request right after resetting to a fetch / consume position either
according to the reset policy (e.g. on consumer starting up, or handling of out of range offset,
etc), or through the {code} seek {code} so that if the consumer fails right after these event,
upon recovery it can restarts from the reset position instead of resetting again: this can
lead to, for example, data loss if we use "largest" as reset policy while there are new messages
coming to the fetching partitions.

This message was sent by Atlassian JIRA

View raw message