hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3936) MiniDFSCluster shutdown may fail due to BlocksMap#getBlockCollection NPE
Date Fri, 14 Sep 2012 07:06:08 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455620#comment-13455620
] 

Colin Patrick McCabe commented on HDFS-3936:
--------------------------------------------

I would lean towards solution #3.  It might need a little bit of finesse, but it should be
simple in theory to have the lock semantic of "wait for us to get the lock or be told to exit."

I'm afraid of hitting other issues if we go with #4, since BlockManager#replicationThread
touches a lot more stuff than just BlocksMap.  The replication manager does a lot of stuff,
and it really seems like we're asking for trouble if we don't shut it down at the end.
                
> MiniDFSCluster shutdown may fail due to BlocksMap#getBlockCollection NPE
> ------------------------------------------------------------------------
>
>                 Key: HDFS-3936
>                 URL: https://issues.apache.org/jira/browse/HDFS-3936
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>
> Looks like HDFS-3664 didn't fix the whole issue because the added join times out because
the thread closing the BM (FSN#stopCommonServices) holds the FSN lock while closing the BM
and the BM is block uninterruptedly trying to aquire the FSN lock.
> {noformat}
> 2012-09-13 18:54:12,526 FATAL hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1355))
- Test resulted in an unexpected exit
> org.apache.hadoop.util.ExitUtil$ExitException: Fatal exception with message null
> stack trace
> java.lang.NullPointerException
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getBlockCollection(BlocksMap.java:101)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1132)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1107)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3061)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3023)
> 	at java.lang.Thread.run(Thread.java:662)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message