hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Travis Crawford (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1271) Decommissioning nodes not persisted between NameNode restarts
Date Sun, 27 Jun 2010 17:45:49 GMT
Decommissioning nodes not persisted between NameNode restarts
-------------------------------------------------------------

                 Key: HDFS-1271
                 URL: https://issues.apache.org/jira/browse/HDFS-1271
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
            Reporter: Travis Crawford


Datanodes in the process of being decomissioned should still be decomissioning after namenode
restarts. Currently they are marked as dead after a restart.


Details:

Nodes can be safely removed from a cluster by marking them as decomissioned and waiting for
their data to be replicated elsewhere. This is accomplished by adding a node to the filed
referenced by dfs.hosts.excluded, then refreshing nodes.

Decomissioning means block reports from the decomissioned datanode are no longer accepted
by the namenode, meaning for decomissioning to occur the NN must have an existing block report.
That is, a datanode can transition from: live --> decomissioning --> dead. Nodes can
NOT transition from: dead --> decomissioning --> dead.

Operationally this is problematic because intervention is required should the NN restart while
nodes are decomissioning, meaning in-house administration tools must be more complex, or more
likely admins have to babysit the decomissioning process.

Someone more familiar with the code might have a better idea, but perhaps the first block
report for dfs.hosts.excluded hosts should be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message