kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pierre Mage (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5074) Transition to OnlinePartition without preferred leader in ISR fails
Date Sat, 12 Aug 2017 00:16:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124325#comment-16124325

Pierre Mage commented on KAFKA-5074:

Running 0.11.0 and observing similar behaviour.

Sequence of events recorded in logs:
1. ZooKeeper session expires
2. Kafka controller stops broker 0
3. Kafka re-register broker 0 in ZooKeeper
4. Leader cache \[mytopic,29\] -> (Leader:2,ISR:2,0,LeaderEpoch:0,ControllerEpoch:1)
5. Invoking state change to OfflineReplica for replicas \[Topic=mytopic,Partition=29,Replica=0\]
6. Retaining last ISR 0 of partition \[mytopic,29\] since unclean leader election is disabled
7. New leader and ISR for partition \[mytopic,29\] is {"leader":-1,"leader_epoch":4,"isr":[0]}
8. Not sending request (type=StopReplicaRequest...) to broker 0, since it is offline
9. Invoking state change to OnlineReplica for replicas \[Topic=mytopic,Partition=29,Replica=0\]
10. Cycle of failing preferred leader elections starts

OfflinePartitionLeaderSelector is not called as the partition's state is still OnlinePartition.
ERROR Controller 2 epoch 4 encountered error while electing leader for partition [mytopic,29]
due to: Preferred replica 2 for partition [mytopci,29] is either not alive or not in the isr.
Current leader and ISR [{"leader":-1,"leader_epoch":4,"isr":[0]}].
ERROR Controller 2 epoch 4 initiated state change for partition [mytopic,29] from OnlinePartition
to OnlinePartition failed

> Transition to OnlinePartition without preferred leader in ISR fails
> -------------------------------------------------------------------
>                 Key: KAFKA-5074
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5074
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions:
>            Reporter: Dustin Cote
> Running, the controller can get into a state where it no longer is able to elect
a leader for an Offline partition. It's unclear how this state is first achieved but in the
steady state, this happens:
> -There are partitions with a leader of -1
> -The Controller repeatedly attempts a preferred leader election for these partitions
> -The preferred leader election fails because the only replica in the ISR is not the preferred
> The log cycle looks like this:
> {code}
> [2017-04-12 18:00:18,891] INFO [Controller 8]: Starting preferred replica leader election
for partitions topic,1
> [2017-04-12 18:00:18,891] INFO [Partition state machine on Controller 8]: Invoking state
change to OnlinePartition for partitions topic,1
> [2017-04-12 18:00:18,892] INFO [PreferredReplicaPartitionLeaderSelector]: Current leader
-1 for partition [topic,1] is not the preferred replica. Trigerring preferred replica leader
election (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> [2017-04-12 18:00:18,893] WARN [Controller 8]: Partition [topic,1] failed to complete
preferred replica leader election. Leader is -1 (kafka.controller.KafkaController)
> {code}
> It's not clear if this would happen on versions later that

This message was sent by Atlassian JIRA

View raw message