hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-1547) Improve decommission mechanism
Date Wed, 12 Jan 2011 23:15:49 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HDFS-1547:
------------------------------

    Attachment: show-stats-broken.txt

I think my concern can be alleviated by simply moving the "totalLoad += node.getXceiverCount"
change outside the condition for decommissioning.

On a separate note in the same function - the modified accounting isn't correct. Here's what
happens:

- we have a DN in normal state, so it's represented in the stats
- we call refreshNodes to put it in decom state
- next heartbeat:
-- we call updateStats(node, false) to remove it -> no longer presented in stats
-- we call updateStats(node, true) to re-add it, but it's in decom state, so it doesn't get
added (good)
- next heartbeat:
-- we call updateStats(node, false) again, and stats get decremented again.
-- on updateStats(node, true) it doesn't increment because it's in DECOM state

We don't appear to have any test cases that look at this, but I just added some logging (see
attached patch) and can see as soon as the node enters DECOMMISSIONING state, the cluster
stats drop into the negatives and keep on dropping.

> Improve decommission mechanism
> ------------------------------
>
>                 Key: HDFS-1547
>                 URL: https://issues.apache.org/jira/browse/HDFS-1547
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.23.0
>
>         Attachments: HDFS-1547.1.patch, HDFS-1547.patch, show-stats-broken.txt
>
>
> Current decommission mechanism driven using exclude file has several issues. This bug
proposes some changes in the mechanism for better manageability. See the proposal in the next
comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message