hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1168) Ghost nodes in excluded node list for block allocation limit replication target count
Date Thu, 20 May 2010 06:59:55 GMT
Ghost nodes in excluded node list for block allocation limit replication target count
-------------------------------------------------------------------------------------

                 Key: HDFS-1168
                 URL: https://issues.apache.org/jira/browse/HDFS-1168
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs client, name-node
            Reporter: Todd Lipcon


In HDFS-630 we added an excludedNodes parameter when allocating a block. In the case of a
cluster that uses transient IPC ports, this list can accumulate past incarnations of restarted
datanodes. Then, in NetworkTopology.countNumOfAvailableNodes, we count each of these "ghost"
nodes against the total number of available nodes, and decide that there are no spots to place
replicas, even though plenty are alive.

To reproduce, write data into HDFS with a very small block size (say 4KB) and then repeatedly
kill and restart the local DN configured to use a transient port. After you have done so N
times, where N is the number of nodes in the cluster, the NN will fail to allocate any targets
even though N other nodes are still alive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message