kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Koshy (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-256) Bug in the consumer rebalancing logic leads to the consumer not pulling data from some partitions
Date Sat, 28 Jan 2012 02:43:14 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195371#comment-13195371
] 

Joel Koshy commented on KAFKA-256:
----------------------------------

+1 for v3. 

I like the idea of having a tool to check if a consumer is correctly balanced.
A more general comment/question on the kafka.tools package: I thought the tools
package is meant for stand-alone tools that people can run on the command-line,
whose output can be piped for further processing if desired.  If so, it would
be better not to use logging for the tool's output and simply println. 

                
> Bug in the consumer rebalancing logic leads to the consumer not pulling data from some
partitions
> -------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-256
>                 URL: https://issues.apache.org/jira/browse/KAFKA-256
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>             Fix For: 0.7.1
>
>         Attachments: kafka-256-v2.patch, kafka-256-v3.patch
>
>
> There is a bug in the consumer rebalancing logic that makes a consumer not pull data
from some partitions for a topic. It recovers only after the consumer group is restarted and
doesn't hit this bug again.
> Here is the observed behavior of the consumer when it hits the bug -
> 1. Consumer is consuming 2 topics with 1 partition each on 2 brokers
> 2. Broker 2 is bounced
> 3. Rebalancing operation triggers for topic_2, where the consumer decides to now consume
data only from Broker 1 for topic_2
> 4. During the rebalancing operation, ZK has not yet deleted the /brokers/topics/topic_1/broker_2,
so the consumer still decides to consumer from both brokers for topic_1
> 5. While restarting the fetchers, it tries to restart fetcher for broker 2 and throws
a RuntimeException. Before this, it has successfully started fetcher for broker 1 and is consuming
data from broker_1
> 6. This exception trickles all the way upto syncedRebalance API and the oldPartitionsPerTopicMap
does not get updated to reflect that for topic_2, the consumer has now seen only broker_1.
It still points to topic_2 -> broker_1, broker_2
> 7. Next rebalancing attempt gets triggered
> 8. By now, broker 2 is restarted and registered in zookeeper
> 9. For topic_2, the consumer tries to see if rebalancing needs to be done. Since it doesn't
see a change in the cached topic partition map, it decides there is no need to rebalance.
> 10. It continues fetching only from broker_1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message