kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Json Tu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-4360) Controller may deadLock when autoLeaderRebalance encounter zk expired
Date Tue, 01 Nov 2016 05:50:58 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624461#comment-15624461
] 

Json Tu edited comment on KAFKA-4360 at 11/1/16 5:50 AM:
---------------------------------------------------------

it is wonderful,I search onControllerResignation() in kafka codes. just as you say,there
are two other invokes in ZookeeperLeaderElector,can you assign this task to me,I very
pleased to put a pull request for it,thank you


was (Author: json tu):
it is wonderful,I search onControllerResignation() in kafka codes. just as you say there
are two other invokes in ZookeeperLeaderElector,can you assign this task to me,I very
pleased to put a pull request for it,thank you

> Controller may deadLock when autoLeaderRebalance encounter zk expired
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-4360
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4360
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>            Reporter: Json Tu
>              Labels: bugfix
>         Attachments: deadlock_patch, yf-mafka2-common02_jstack.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> when controller has checkAndTriggerPartitionRebalance task in autoRebalanceScheduler,and
then zk expired at that time. It will
> run into deadlock.
> we can restore the scene as below,when zk session expired,zk thread will call handleNewSession
which defined in SessionExpirationListener, and it will get controllerContext.controllerLock,and
then it will autoRebalanceScheduler.shutdown(),which need complete all the task in the autoRebalanceScheduler,but
that threadPoll also need get controllerContext.controllerLock,but it has already owned
by zk callback thread,which will then run into deadlock.
> because of that,it will cause two problems at least, first is the broker’s id is
cannot register to the zookeeper,and it will be considered as dead by new controller,second
this procedure can not be stop by kafka-server-stop.sh, because shutdown function
> can not get controllerContext.controllerLock also, we cannot shutdown kafka except using
kill -9.
> In my attachment, I upload a jstack file, which was created when my kafka procedure cannot
shutdown by kafka-server-stop.sh.
> I have met this scenes for several times,I think this may be a bug that not solved
in kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message