hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3936) MiniDFSCluster shutdown may fail due to BlocksMap#getBlockCollection NPE
Date Fri, 14 Sep 2012 18:24:08 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456001#comment-13456001
] 

Colin Patrick McCabe commented on HDFS-3936:
--------------------------------------------

{code}
   private void updateNeededReplications(final Block block,
       final int curReplicasDelta, int expectedReplicasDelta) {
-    namesystem.writeLock();
+    try {
+      updateNeededReplicationsInterruptible(
+          block, curReplicasDelta, expectedReplicasDelta);
+    } catch (InterruptedException ie) {
+      LOG.warn("Interrupted while updating replication queues");
+    }
+  }
{code}

I don't like this function because it swallows exceptions.  If an unexpected exception happened,
we shouldn't swallow it-- we should assert.

Alternately, perhaps updateNeededReplications should take a boolean that specifies whether
it locks interruptibly.  This probably would be the simplest way out of the interruptibility
dilemma.
                
> MiniDFSCluster shutdown may fail due to BlocksMap#getBlockCollection NPE
> ------------------------------------------------------------------------
>
>                 Key: HDFS-3936
>                 URL: https://issues.apache.org/jira/browse/HDFS-3936
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hdfs-3936.txt
>
>
> Looks like HDFS-3664 didn't fix the whole issue because the added join times out because
the thread closing the BM (FSN#stopCommonServices) holds the FSN lock while closing the BM
and the BM is block uninterruptedly trying to aquire the FSN lock.
> {noformat}
> 2012-09-13 18:54:12,526 FATAL hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1355))
- Test resulted in an unexpected exit
> org.apache.hadoop.util.ExitUtil$ExitException: Fatal exception with message null
> stack trace
> java.lang.NullPointerException
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getBlockCollection(BlocksMap.java:101)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1132)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1107)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3061)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3023)
> 	at java.lang.Thread.run(Thread.java:662)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message