Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@kafka.apache.org
Date: Tue, 20 Jun 2017 17:53:00 +0000 (UTC)
From: "Jun Rao (JIRA)" <jira@apache.org>
To: jira@kafka.apache.org
Message-ID: <JIRA.13081098.1497958691000.60712.1497981180445@Atlassian.JIRA>
In-Reply-To: <JIRA.13081098.1497958691000@Atlassian.JIRA>
References: <JIRA.13081098.1497958691000@Atlassian.JIRA> <JIRA.13081098.1497958691574@jira-lw-us.apache.org>
Subject: [jira] [Commented] (KAFKA-5480) Partition Leader may not be elected
 although there is one live replica in ISR
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 20 Jun 2017 17:53:07 -0000


    [ https://issues.apache.org/jira/browse/KAFKA-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056172#comment-16056172 ] 

Jun Rao commented on KAFKA-5480:
--------------------------------

[~pengwei], thanks for reporting this. The issue seems to be the same as https://issues.apache.org/jira/browse/KAFKA-3083. The fundamental problem is that the old controller should stop its activity sooner, when its connection to ZK is lost. We plan to fix this issue as part of the controller improvement work in https://issues.apache.org/jira/browse/KAFKA-5027.  

> Partition Leader may not be elected although there is one live replica in ISR
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-5480
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5480
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.9.0.1, 0.10.2.0
>            Reporter: Pengwei
>              Labels: reliability
>             Fix For: 0.11.1.0
>
>
> Currently we found a consumer blocking in the poll because of the coordinator of this consumer group is not available.
> Digging in the log, we found some of the __consumer_offsets' partitions' leader are -1, so the coordinator not available is 
> because of leader is not available, the scene is as follow:
> There are 3 brokers in the cluster, and the network of the cluster is not stable.  At the beginning, the partition [__consumer_offsets,3] 
> Leader is 3, ISR is [3, 1, 2]
> 1. Broker 1 become the controller: 
> [2017-06-10 15:48:30,006] INFO [Controller 1]: Broker 1 starting become controller state transition (kafka.controller.KafkaController)
> [2017-06-10 15:48:30,085] INFO [Controller 1]: Initialized controller epoch to 8 and zk version 7 (kafka.controller.KafkaController)
> [2017-06-10 15:48:30,088] INFO [Controller 1]: Controller 1 incremented epoch to 9 (kafka.controller.KafkaController)
> 2. Broker 2 soon becomes the controller, it is aware of all the brokers:
> [2017-06-10 15:48:30,936] INFO [Controller 2]: Broker 2 starting become controller state transition (kafka.controller.KafkaController)
> [2017-06-10 15:48:30,936] INFO [Controller 2]: Initialized controller epoch to 9 and zk version 8 (kafka.controller.KafkaController)
> [2017-06-10 15:48:30,943] INFO [Controller 2]: Controller 2 incremented epoch to 10 (kafka.controller.KafkaController)
> [2017-06-10 15:48:31,574] INFO [Controller 2]: Currently active brokers in the cluster: Set(1, 2, 3) (kafka.controller.KafkaController)
> [2017-06-10 15:48:31,574] INFO [Controller 2]: Currently shutting brokers in the cluster: Set() (kafka.controller.KafkaController)
> So broker 2 think Leader 3 is alive, does not need to elect leader.
> 3. Broker 1 is not resign until 15:48:32,  but it is not aware of the broker 3:
> [2017-06-10 15:48:31,470] INFO [Controller 1]: List of partitions to be deleted: Map() (kafka.controller.KafkaController)
> [2017-06-10 15:48:31,470] INFO [Controller 1]: Currently active brokers in the cluster: Set(1, 2) (kafka.controller.KafkaController)
> [2017-06-10 15:48:31,470] INFO [Controller 1]: Currently shutting brokers in the cluster: Set() (kafka.controller.KafkaController)
> and change the Leader to broker 1:
> [2017-06-10 15:48:31,847] DEBUG [OfflinePartitionLeaderSelector]: Some broker in ISR is alive for [__consumer_offsets,3]. Select 1 from ISR 1,2 to be the leader. (kafka.controller.OfflinePartitionLeaderSelector)
> broker 1 resign until 15:48:32 when the zk client is aware of the broker 2 has change the controller's data:
> kafka.common.ControllerMovedException: Broker 1 received update metadata request with correlation id 4 from an old controller 1 with epoch 9. Latest known controller epoch is 10
> 	at kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:621)
> 	at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:163)
> 	at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
> 	at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> 	at java.lang.Thread.run(Thread.java:748)
> [2017-06-10 15:48:32,307] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> 4. Then broker 2's controllerContext.partitionLeadershipInfo cached the Leader is 3 ISR is [3,1,2], but in zk
> Leader is 1 ISR is [1, 2].  It will keep this a long time until another zk event happen.
> 5. After 1 day, broker 2 received the broker 1's broker change event:
> [2017-06-12 21:43:18,287] INFO [BrokerChangeListener on Controller 2]: Broker change listener fired for path /brokers/ids with children 2,3 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> [2017-06-12 21:43:18,293] INFO [BrokerChangeListener on Controller 2]: Newly added brokers: , deleted brokers: 1, all live brokers: 2,3 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> then broker 2 will invoke onBrokerFailure for the deleted broker 1,  but because Leader is 3, it will not change the partition to OfflinePartition and will not change the leader in partitionStateMachine.triggerOnlinePartitionStateChange().
> But in the replicaStateMachine.handleStateChanges(activeReplicasOnDeadBrokers, OfflineReplica), it will remove the replica 1 in ISR.
> In the removeReplicaFromIsr,  controller will read the ISR from zk again, will find Leader change to 1, then it will change 
> the partition's leader to -1 and ISR [2]:
> [2017-06-12 21:43:19,158] DEBUG [Controller 2]: Removing replica 1 from ISR 1,3,2 for partition [__consumer_offsets,3]. (kafka.controller.KafkaController)
> [2017-06-12 21:43:19,160] INFO [Controller 2]: New leader and ISR for partition [__consumer_offsets,3] is {"leader":-1,"leader_epoch":15,"isr":[2]} (kafka.controller.KafkaController)
> So the [__consumer_offsets,3] partition's leader is -1, although replica 2 is alive and in ISR, it will keep this a long time until another relative zk event happen, for example, reboot one of the broker who have the replicas.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)