hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1885) Race condition in MiniDFSCluster shutdown
Date Wed, 12 Sep 2007 20:05:32 GMT
Race condition in MiniDFSCluster shutdown

                 Key: HADOOP-1885
                 URL: https://issues.apache.org/jira/browse/HADOOP-1885
             Project: Hadoop
          Issue Type: Bug
          Components: test
            Reporter: Chris Douglas
            Assignee: Chris Douglas

Hudson has been sporadically failing tests that start- or follow tests that start- multiple
datanodes in MiniDFSCluster, particularly on Solaris and Windows. The following appears to
be at least partially responsible (much credit to Nigel for helping to discern this).

A common error:
java.io.IOException: Cannot remove data directory: /export/home/hudson/hudson/jobs/Hadoop-Nightly/workspace/trunk/build/test/data/dfs/data
	at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:126)
	at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:80)
	at org.apache.hadoop.dfs.TestFsck.testFsckNonExistent(TestFsck.java:96)

MiniDFSCluster starts multiple DataNodes by calling DataNode::createDataNode, which creates
and starts a DataNode thread, assigns the instance to a static member, and returns the Runnable.
Of course, each call from MiniDFSCluster overwrites this instance. Since DataNode::shutdown()
calls join() on the same Thread, each subsequent join is essentially a noop after the first
DataNode finishes. When MiniDFSCluster::shutdown() returns, it may not have released its resources,
so the next MiniDFSCluster may fail to start.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message