kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vahid Hashemian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-2985) Consumer group stuck in rebalancing state
Date Fri, 16 Sep 2016 15:16:21 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496554#comment-15496554
] 

Vahid Hashemian commented on KAFKA-2985:
----------------------------------------

Perhaps you are running into [this issue|https://issues.apache.org/jira/browse/KAFKA-3859]?

> Consumer group stuck in rebalancing state
> -----------------------------------------
>
>                 Key: KAFKA-2985
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2985
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.9.0.0
>         Environment: Kafka 0.9.0.0.
> Kafka Java consumer 0.9.0.0
> 2 Java producers.
> 3 Java consumers using the new consumer API.
> 2 kafka brokers.
>            Reporter: Jens Rantil
>            Assignee: Jason Gustafson
>
> We've doing some load testing on Kafka. _After_ the load test when our consumers and
have two times now seen Kafka become stuck in consumer group rebalancing. This is after all
our consumers are done consuming and essentially polling periodically without getting any
records.
> The brokers list the consumer group (named "default"), but I can't query the offsets:
> {noformat}
> jrantil@queue-0:/srv/kafka/kafka$ ./bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server
localhost:9092 --list
> default
> jrantil@queue-0:/srv/kafka/kafka$ ./bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server
localhost:9092 --describe --group default|sort
> Consumer group `default` does not exist or is rebalancing.
> {noformat}
> Retrying to query the offsets for 15 minutes or so still said it was rebalancing. After
restarting our first broker, the group immediately started rebalancing. That broker was logging
this before restart:
> {noformat}
> [2015-12-12 13:09:48,517] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired
offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
> [2015-12-12 13:10:16,139] INFO [GroupCoordinator 0]: Stabilized group default generation
16 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:10:16,141] INFO [GroupCoordinator 0]: Assignment received from leader
for group default for generation 16 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:10:16,575] INFO [GroupCoordinator 0]: Preparing to restabilize group default
with old generation 16 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:11:15,141] INFO [GroupCoordinator 0]: Stabilized group default generation
17 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:11:15,143] INFO [GroupCoordinator 0]: Assignment received from leader
for group default for generation 17 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:11:15,314] INFO [GroupCoordinator 0]: Preparing to restabilize group default
with old generation 17 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:12:14,144] INFO [GroupCoordinator 0]: Stabilized group default generation
18 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:12:14,145] INFO [GroupCoordinator 0]: Assignment received from leader
for group default for generation 18 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:12:14,340] INFO [GroupCoordinator 0]: Preparing to restabilize group default
with old generation 18 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:13:13,146] INFO [GroupCoordinator 0]: Stabilized group default generation
19 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:13:13,148] INFO [GroupCoordinator 0]: Assignment received from leader
for group default for generation 19 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:13:13,238] INFO [GroupCoordinator 0]: Preparing to restabilize group default
with old generation 19 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:14:12,148] INFO [GroupCoordinator 0]: Stabilized group default generation
20 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:14:12,149] INFO [GroupCoordinator 0]: Assignment received from leader
for group default for generation 20 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:14:12,360] INFO [GroupCoordinator 0]: Preparing to restabilize group default
with old generation 20 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:15:11,150] INFO [GroupCoordinator 0]: Stabilized group default generation
21 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:15:11,152] INFO [GroupCoordinator 0]: Assignment received from leader
for group default for generation 21 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:15:11,217] INFO [GroupCoordinator 0]: Preparing to restabilize group default
with old generation 21 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:16:10,152] INFO [GroupCoordinator 0]: Stabilized group default generation
22 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:16:10,154] INFO [GroupCoordinator 0]: Assignment received from leader
for group default for generation 22 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:16:10,339] INFO [GroupCoordinator 0]: Preparing to restabilize group default
with old generation 22 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:17:09,155] INFO [GroupCoordinator 0]: Stabilized group default generation
23 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:17:09,157] INFO [GroupCoordinator 0]: Assignment received from leader
for group default for generation 23 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:17:09,262] INFO [GroupCoordinator 0]: Preparing to restabilize group default
with old generation 23 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:18:08,157] INFO [GroupCoordinator 0]: Stabilized group default generation
24 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:18:08,159] INFO [GroupCoordinator 0]: Assignment received from leader
for group default for generation 24 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:18:08,333] INFO [GroupCoordinator 0]: Preparing to restabilize group default
with old generation 24 (kafka.coordinator.GroupCoordinator)
> {noformat}
> Our consumers were logging:
> {noformat}
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.AbstractCoordinator
Marking the coordinator 2147483647 dead.
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
Error UNKNOWN_MEMBER_ID occurred while committing offsets for group default
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
Auto offset commit failed: Commit cannot be completed due to group rebalance
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.AbstractCoordinator
Marking the coordinator 2147483647 dead.
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
Error UNKNOWN_MEMBER_ID occurred while committing offsets for group default
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
Auto offset commit failed: Commit cannot be completed due to group rebalance
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
Error UNKNOWN_MEMBER_ID occurred while committing offsets for group default
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
Auto offset commit failed:
> Dec 12 13:09:17 X.X.X.110 system[27782]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.AbstractCoordinator
Attempt to join group default failed due to unknown member id, resetting and retrying.
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
Error UNKNOWN_MEMBER_ID occurred while committing offsets for group default
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
Auto offset commit failed:
> Dec 12 13:09:17 X.X.X.144 system[9915]: [KafkaTaskExecutorConsumer] org.apache.kafka.clients.consumer.internals.AbstractCoordinator
Attempt to join group default failed due to unknown member id, resetting and retrying.
> {noformat}
> I understand that the broker might start rebalancing if my consumers hasn't reported
heartbeat in session timeout. This might well have happened during my load test. However,
the issue here is that the rebalancing doesn't stabilize/finish after the load test is done.
> Let me know if I can be of any assistance to track this down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message