kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Rao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-1120) Controller could miss a broker state change
Date Tue, 05 Nov 2013 18:25:18 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814096#comment-13814096
] 

Jun Rao commented on KAFKA-1120:
--------------------------------

The root cause of this issue is that the controller missed a broker state change that it should
have seen and therefore didn't make the correct decision. One way to fix that is for the controller
to store the creation time of a broker registration. That way, on a broker change event, the
controller can see if there has been any broker whose registration time has changed. We can
then force a leader election on affected partitions accordingly. 

> Controller could miss a broker state change 
> --------------------------------------------
>
>                 Key: KAFKA-1120
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1120
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.1
>            Reporter: Jun Rao
>
> When the controller is in the middle of processing a task (e.g., preferred leader election,
broker change), it holds a controller lock. During this time, a broker could have de-registered
and re-registered itself in ZK. After the controller finishes processing the current task,
it will start processing the logic in the broker change listener. However, it will see no
broker change and therefore won't do anything to the restarted broker. This broker will be
in a weird state since the controller doesn't inform it to become the leader of any partition.
Yet, the cached metadata in other brokers could still list that broker as the leader for some
partitions. Client requests routed to that broker will then get a TopicOrPartitionNotExistException.
This broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message