hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2946) Deadlock in ZKRMStateStore
Date Wed, 10 Dec 2014 14:19:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241133#comment-14241133
] 

Naganarasimha G R commented on YARN-2946:
-----------------------------------------

Hi [~rohithsharma]
I feel we need to have separate object to lock on for zoo keeper client related flows and
another object for ZKRMState store related flows. As "AsyncDispatcher event handler" on waiting
for ZK client releases ZKRMStateStore object's lock but holds on to StateMachineFactory$InternalStateMachine
obj's lock, during this time some other thread can take lock on ZKRMStateStore object's lock
which is not correct. So i suggest the flows where we are setting the Zkclient to null and
checking for other ZKConnection we can have new object and remove the current syncronization
on ZKRMStateStore's instance

> Deadlock in ZKRMStateStore
> --------------------------
>
>                 Key: YARN-2946
>                 URL: https://issues.apache.org/jira/browse/YARN-2946
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Rohith
>            Assignee: Rohith
>
> Found one deadlock in ZKRMStateStore.
> # Initial stage zkClient is null because of zk disconnected event.
> # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to re establish
zookeeper connection either via synconnected or expired event, it is highly possible that
any other thred can obtain lock on {{ZKRMStateStore.this}} from state machine transition events.
This cause Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message