kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evan Nelson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-3798) Kafka Consumer 0.10.0.0 killed after rebalancing exception
Date Mon, 28 Nov 2016 21:34:58 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703193#comment-15703193
] 

Evan Nelson edited comment on KAFKA-3798 at 11/28/16 9:34 PM:
--------------------------------------------------------------

We are experiencing the same issue with 0.8.2.2:

org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /consumers/**\*/ids/\***
	at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) ~[zkclient-0.3.jar:0.3]
	at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) ~[zkclient-0.3.jar:0.3]
        etc...

(identifiers replaced with ***)

This happens on two different topics, one with 20 partitions and one with 40. We have 22 consumers
for each. The event always seems to be precipitated by a zookeeper connection timeout, which
may have been triggered by a long GC pause (~5.5 seconds). Once the rebalance loop starts
it _never_ recovers, no matter how many retries we allot.


was (Author: ean5533):
We are experiencing the same issue with 0.8.2.2:

org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /consumers/***/ids/***
	at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) ~[zkclient-0.3.jar:0.3]
	at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) ~[zkclient-0.3.jar:0.3]
        etc...

(identifiers replaced with ***)

This happens on two different topics, one with 20 partitions and one with 40. We have 22 consumers
for each. The event always seems to be precipitated by a zookeeper connection timeout, which
may have been triggered by a long GC pause (~5.5 seconds). Once the rebalance loop starts
it _never_ recovers, no matter how many retries we allot.

> Kafka Consumer 0.10.0.0 killed after rebalancing exception
> ----------------------------------------------------------
>
>                 Key: KAFKA-3798
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3798
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 0.10.0.0
>         Environment: Production
>            Reporter: Sahitya Agrawal
>            Assignee: Neha Narkhede
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Hi , 
> I have a topic with 100 partitions and 25 consumers. Consumers were working fine up to
some time. 
> After some time I see kafka rebalancing exception in the logs. CPU usage is also 100
% at that time. Consumer process got killed after that. 
> Kafka version : 0.10.0.0
> Some Error print from the logs are following:
> kafka.common.ConsumerRebalanceFailedException: prod_ip-**** can't rebalance after 10
retries
>         at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:670)
>         at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$2.run(ZookeeperConsumerConnector.scala:589)
> exception during rebalance
> org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /consumers/prod/ids/prod_ip-*******
>         at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>         at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1099)
>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1094)
>         at kafka.utils.ZkUtils.readData(ZkUtils.scala:542)
>         at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:61)
>         at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:674)
>         at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:646)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>         at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:637)
>         at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:637)
>         at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply(ZookeeperConsumerConnector.scala:637)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:636)
>         at kafka.consumer.ZookeeperConsumerConnector$ZKSessionExpireListener.handleNewSession(ZookeeperConsumerConnector.scala:522)
>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735)
>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
for /consumers/prod/ids/prod_ip-******
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
>         at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:124)
>         at org.I0Itec.zkclient.ZkClient$12.call(ZkClient.java:1103)
>         at org.I0Itec.zkclient.ZkClient$12.call(ZkClient.java:1099)
>         at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message