hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3846) Namenode deadlock in branch-1
Date Thu, 23 Aug 2012 18:09:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440501#comment-13440501
] 

Brandon Li commented on HDFS-3846:
----------------------------------

One deadlock example is between SafeModeMonitor and blockreport. 

{noformat}
Thread 16142: (state = BLOCKED)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanodeListForReport(org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType)
@bci=0, line=4208 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNumberOfDatanodes(org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType)
@bci=2, line=4202 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNumLiveDataNodes() @bci=4, line=4198
(Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.needEnter() @bci=17, line=4886
(Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.canLeave() @bci=38, line=4878
(Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor.run() @bci=27, line=5074
(Interpreted frame) - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Thread 16126: (state = BLOCKED)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.incrementSafeBlockCount(short)
@bci=0, line=4938 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem.incrementSafeBlockCount(int) @bci=14,
line=5141 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addStoredBlock(org.apache.hadoop.hdfs.protocol.Block,
org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor, org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor)
@bci=1134, line=3749 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(org.apache.hadoop.hdfs.protocol.DatanodeID,
org.apache.hadoop.hdfs.protocol.BlockListAsLongs) @bci=316, line=3548 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration,
long[]) @bci=70, line=978 (Interpreted frame)
- sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method, java.lang.Object,
java.lang.Object[]) @bci=0 (Interpreted frame)
- sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=87,
line=39 (Interpreted frame)
- sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6,
line=25 (Interpreted frame)
- java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=161, line=597
(Interpreted frame)
- org.apache.hadoop.ipc.RPC$Server.call(java.lang.Class, org.apache.hadoop.io.Writable, long)
@bci=74, line=578 (Interpreted frame)
- org.apache.hadoop.ipc.Server$Handler$1.run() @bci=31, line=1388 (Interpreted frame)
- org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=1384 (Interpreted frame)
- java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext)
@bci=0 (Interpreted frame)
- javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction)
@bci=42, line=396 (Interpreted frame)
- org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
@bci=14, line=1122 (Interpreted frame) - org.apache.hadoop.ipc.Server$Handler.run() @bci=205,
line=1382 (Interpreted frame)
{noformat}


                
> Namenode deadlock in branch-1
> -----------------------------
>
>                 Key: HDFS-3846
>                 URL: https://issues.apache.org/jira/browse/HDFS-3846
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Brandon Li
>
> Jitendra found out the following problem:
> 1. Handler : Acquires namesystem lock waits on SafemodeInfo lock at SafeModeInfo.isOn()
> 2. SafemodeMonitor : Calls SafeModeInfo.canLeave() which is synchronized so SafemodeInfo
lock is acquired, but this method also causes following call sequence needEnter() -> getNumLiveDataNodes()
-> getNumberOfDatanodes() -> getDatanodeListForReport() -> getDatanodeListForReport()
. The getDatanodeListForReport is synchronized with FSNamesystem lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message