kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ismael Juma (Jira)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-12890) Consumer group stuck in `CompletingRebalance`
Date Tue, 08 Jun 2021 13:57:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ismael Juma updated KAFKA-12890:
    Fix Version/s: 2.7.2

> Consumer group stuck in `CompletingRebalance`
> ---------------------------------------------
>                 Key: KAFKA-12890
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12890
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.7.0, 2.6.1, 2.8.0, 2.7.1, 2.6.2
>            Reporter: David Jacot
>            Assignee: David Jacot
>            Priority: Blocker
>             Fix For: 3.0.0, 2.6.3, 2.7.2, 2.8.1
> We have seen recently multiple consumer groups stuck in `CompletingRebalance`. It appears
that those group never receives the assignment from the leader of the group and remains stuck
in this state forever.
> When a group transitions to the `CompletingRebalance` state, the group coordinator sets
up `DelayedHeartbeat` for each member of the group. It does so to ensure that the member sends
a sync request within the session timeout. If it does not, the group coordinator rebalances
the group. Note that here, `DelayedHeartbeat` is used here for this purpose. `DelayedHeartbeat`
are also completed when member heartbeats.
> The issue is that https://github.com/apache/kafka/pull/8834 has changed the heartbeat
logic to allow members to heartbeat while the group is in the `CompletingRebalance` state.
This was not allowed before. Now, if a member starts to heartbeat while the group is in the
`CompletingRebalance`, the heartbeat request will basically complete the pending `DelayedHeartbeat`
that was setup previously for catching not receiving the sync request. Therefore, if the sync
request never comes, the group coordinator does not notice anymore.
> We need to bring that behavior back somehow.

This message was sent by Atlassian Jira

View raw message