hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5832) Deadlock found in NN between SafeMode#canLeave and DatanodeManager#handleHeartbeat
Date Sun, 26 Jan 2014 05:52:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882189#comment-13882189
] 

Vinay commented on HDFS-5832:
-----------------------------

As mentioned in HDFS-5132, 
Moving SafemodeMonitor#run() checks under fsn write lock, will solve the issue. 

1. handleHeartbeat() is always done under fsn readlock
2. incrementSafeBlockCount() and getNumLivedatanodes() will always will be called under writeLock().

By directly seeing the synchronization order it appears to be deadlock. But its avoided by
the fsn lock.
 I think jcarder will not identify the read-write lock mechanism.

For this reason only I have made HDFS-5368 duplicate of HDFS-5132

> Deadlock found in NN between SafeMode#canLeave and DatanodeManager#handleHeartbeat
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-5832
>                 URL: https://issues.apache.org/jira/browse/HDFS-5832
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Blocker
>         Attachments: HDFS-5832.patch, jcarder_nn_deadlock.gif
>
>
> Found the deadlock during the Namenode startup. Attached jcarder report which shows the
cycles about the deadlock situation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message