hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2229) Deadlock in NameNode
Date Fri, 05 Aug 2011 11:21:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079910#comment-13079910
] 

Vinod Kumar Vavilapalli commented on HDFS-2229:
-----------------------------------------------

Thread dump:

{quote}
Found one Java-level deadlock:
=============================
"Thread[Thread-10,5,main]":
  waiting to lock monitor 0x08e19b8c (object 0x31b0b7f0, a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo),
  which is held by "main"
"main":
  waiting for ownable synchronizer 0x31641a50, (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync),
  which is held by "Thread[Thread-10,5,main]"

Java stack information for the threads listed above:
===================================================
"Thread[Thread-10,5,main]":
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.isOn(FSNamesystem.java:3183)
        - waiting to lock <0x31b0b7f0> (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.isInSafeMode(FSNamesystem.java:3563)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logUpdateMasterKey(FSNamesystem.java:4523)
        at org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logUpdateMasterKey(DelegationTokenSecretManager.java:279)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.updateCurrentKey(AbstractDelegationTokenSecretManager.java:144)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.rollMasterKey(AbstractDelegationTokenSecretManager.java:168)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:373)
        at java.lang.Thread.run(Thread.java:619)
"main":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x31641a50> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
        at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeLock(FSNamesystem.java:382)
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processMisReplicatedBlocks(BlockManager.java:1743)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.initializeReplQueues(FSNamesystem.java:3257)
        - locked <0x31b0b7f0> (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.leave(FSNamesystem.java:3228)
        - locked <0x31b0b7f0> (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.checkMode(FSNamesystem.java:3315)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.setBlockTotal(FSNamesystem.java:3342)
        - locked <0x31b0b7f0> (a org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setBlockTotal(FSNamesystem.java:3619)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.activate(FSNamesystem.java:322)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.activate(NameNode.java:489)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:452)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:561)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:553)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1538)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1578)

Found 1 deadlock.
{quote}

Logs show that RPC server is started but no the http server
{quote}
...
2011-08-05 08:54:35,286 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files
= 1
2011-08-05 08:54:35,286 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files
under construction = 0
2011-08-05 08:54:35,287 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of
size 113 loaded in 0 seconds.
2011-08-05 08:54:35,287 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded image
for txid 0 from /tmp/hdfs/hadoop/var/hdfs/name/current/fsimage_0000000000000000000
2011-08-05 08:54:35,288 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file /tmp/hdfs/hadoop/var/hdfs/name/current/edits_0000000000000000001-0000000000000000002
of size 1048576 edits # 2 loaded in 0 seconds.
2011-08-05 08:54:35,291 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
segment at 3
2011-08-05 08:54:35,363 INFO org.apache.hadoop.hdfs.server.namenode.NameCache: initialized
with 0 entries 0 lookups
2011-08-05 08:54:35,363 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished
loading FSImage in 387 msecs
2011-08-05 08:54:35,428 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port
8020
2011-08-05 08:54:35,430 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #2 for port
8020
2011-08-05 08:54:35,432 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #3 for port
8020
2011-08-05 08:54:35,433 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #4 for port
8020
2011-08-05 08:54:35,443 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Registered
source RpcActivityForPort8020
2011-08-05 08:54:35,451 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Registered
source RpcDetailedActivityForPort8020
2011-08-05 08:54:35,455 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2011-08-05 08:54:35,457 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2011-08-05 08:54:35,457 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2011-08-05 08:54:35,457 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
blocks under construction: 0
2011-08-05 08:54:35,457 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: initializing
replication queues
^Stuck here
{quote}

> Deadlock in NameNode
> --------------------
>
>                 Key: HDFS-2229
>                 URL: https://issues.apache.org/jira/browse/HDFS-2229
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>
> Either I am doing something incredibly stupid, or something about my environment is completely
weird, or may be it really is a valid bug. I am running a NameNode deadlock consistently with
0.23 HDFS. I could never start NN successfully.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message