hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1547) Improve decommission mechanism
Date Wed, 12 Jan 2011 21:34:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980939#action_12980939
] 

Suresh Srinivas commented on HDFS-1547:
---------------------------------------

Thinking a bit more about the problem, I think there could be issues in some cases:
Consider a cluster with N nodes, L live and D decommissioned with transceiver load on each
datanode {X1, X2, ... XN}. 

A datanode is not good for write when Xi > 2 * X /(L+D)

That means when D > L, a lot of the nodes will be not eligible for writes. The remainining
that are good, will have to take write load and will push X higher. Also read traffic that
is not subject to the above condition will push X higher. In the worst case scenarios, if
the load on every node is equal to X and write load dominates reads, then very few or no nodes
are good for writes!


Some observations:
# This problem is severe as D gets closer to and more than N/2.
# Doing such a decommission of large number datanodes has several issues:
#* It reduces cluster available free storage for writes. Writes could simply fail because
of no free storage. The decommissioning may not complete, because of lack of free storage.

#* Further when this happens, the number nodes available for writes is significantly reduced
(as writes are not done to D nodes).
#* Note this problem also exists when decommissioning is in progress for large number of nodes.

Given this I am leaning towards not handling this case.


> Improve decommission mechanism
> ------------------------------
>
>                 Key: HDFS-1547
>                 URL: https://issues.apache.org/jira/browse/HDFS-1547
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.23.0
>
>         Attachments: HDFS-1547.1.patch, HDFS-1547.patch
>
>
> Current decommission mechanism driven using exclude file has several issues. This bug
proposes some changes in the mechanism for better manageability. See the proposal in the next
comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message