kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Json Tu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-4447) Controller resigned but it also acts as a controller for a long time
Date Fri, 25 Nov 2016 17:38:58 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696340#comment-15696340
] 

Json Tu edited comment on KAFKA-4447 at 11/25/16 5:38 PM:
----------------------------------------------------------

after check the email's response in the dev's mail list,I review the kafka's code again,
I guess the reason may be as below.
1.as [~guozhang]'s saying, "unsubscribeChildChanges" on ZkClient and listener fired procedure
are executed on different threads.
2.the zkclient's event thread which processing callbacks from zk server is a single thread.
and it may be have many callbacks after controller's 
SessionExpirationListener's callback, such as ReassignedPartitionsIsrChangeListener, IsrChangeNotificationListener
and so on.
3.so after we execute SessionExpirationListener's callback, though it deregister all listener
at the end. but we also need to run other callback's after this controller ressign.
4.so the controller's log of the attachment shows that it also acts as a controller, and it
continued about 3 minutes.
5.I think the reason that leads to so long time is that my kafka cluster's enviroment's is
not so stable,and it leads some brokers expired from the zkserver,which trigger some callback
that listened by controller.

can you give me some suggestions. [~guozhang] [~becket_qin]



was (Author: json tu):
after check the email's response in the dev's mail list,I review the kafka's code again,
I guess the reason may be as below.
1.as [~guozhang]'s saying, "unsubscribeChildChanges" on ZkClient and listener fired procedure
are executed on different threads.
2.the zkclient's event thread which processing callbacks from zk server is single thread.
and it may be many callbacks after controller's 
SessionExpirationListener's callback, such as ReassignedPartitionsIsrChangeListener, IsrChangeNotificationListener
and so on.
3.so after we execute SessionExpirationListener's callback, though it deregister all listener
at the end. but we also need to run other callback's after this controller ressign.
4.so the controller's log of the attachment shows that it also acts as a controller, and it
continued about 3 minutes.
5.I think the reason that leads to so long time is that my kafka cluster's enviroment's is
not so stable,and it leads some brokers expired from the zkserver,which trigger some callback
that listened by controller.

can you give me some suggestions. [~guozhang] [~becket_qin]


> Controller resigned but it also acts as a controller for a long time 
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-4447
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4447
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>    Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>         Environment: Linux Os
>            Reporter: Json Tu
>         Attachments: log.tar.gz
>
>
> We have a cluster with 10 nodes,and we execute following operation as below.
> 1.we execute some topic partition reassign from one node to other 9 nodes in the cluster,
and which triggered controller.
> 2.controller invoke PartitionsReassignedListener's handleDataChange and read all partition
reassign rules from the zk path, and executed all onPartitionReassignment for all partition
that match conditions.
> 3.but the controller is expired from zk, after what some nodes of 9 nodes also expired
from zk.
> 5.then controller invoke onControllerResignation to resigned as the controller.
> we found after the controller is resigned, it acts as controller for about 3 minutes,
which can be found in my attachment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message