hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1547) Improve decommission mechanism
Date Thu, 06 Jan 2011 07:11:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978182#action_12978182

Suresh Srinivas commented on HDFS-1547:

> Does the dfsadmin -refreshNodes command upload the excludes and includes and the decom
file (all three?) into the namenode's in memory state?
It is good to persist this at namenode. That way post namenode restart, the datanodes that
were intended to be out of service will not come back into service.

> when a node is being-decommissioned state, why do you propose that it reduces the frequency
of block reports and heartbeats? Is this really needed?...
Sanjay's comment addresses this.

> I really like the fact that you are proposing the decommissioned nodes are not auto shutdown.

This was my original proposal. After thinking a bit, I see following issues:
* Not shutting down datanodes changes the intent of HADOOP-442; shutting down datanode ensures
problematic datanodes cannot be used any more.
* Currently the shutdown ensures datanodes are not used by the namenode. I am concerned not
shutting down datanode could result in namenode using using decommissioned nodes in unintended
* My concern earlier was, there is no way to figure out if datanode is dead because decommission
is complete  or for other reasons. However namenode has the state that datanode is decommissioned.
We could improve current dead node list to show two lists, decommissioned and dead list in
namenode WebUI.
* The storage free capacity from the decommissioned datanodes should not be counted towards
available storage capacity of the cluster. Only used capacity should count towards cluster
used capacity.

Current behavior is:
# A currently registered datanode is decommissioned and then disallowed to communicate with
# A datanode that had registered previously with NN (after NN is restarted) and currently
not registered is decommissioned, if it registers with NN.
# A datanode that had not registered (after NN restart) is disallowed from registering and
is never decommissioned.

By changing the behavior (3), most of what I had proposed for decom file can be achieved.
This also avoids two config files for exclude and decom with very little and subtle sematic

> Improve decommission mechanism
> ------------------------------
>                 Key: HDFS-1547
>                 URL: https://issues.apache.org/jira/browse/HDFS-1547
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.23.0
> Current decommission mechanism driven using exclude file has several issues. This bug
proposes some changes in the mechanism for better manageability. See the proposal in the next
comment for more details.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message