hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1547) Improve decommission mechanism
Date Fri, 14 Jan 2011 23:26:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981973#action_12981973
] 

Todd Lipcon commented on HDFS-1547:
-----------------------------------

- DECOM_COMPARATOR should probably have some javadoc, it's not obvious from the name what
it does. (why is there no sort order distinction for decommission_ing_ nodes, but just decommissioned
ones? I thought this patch wanted to make decommissioning nodes sort lower for block locations
also?)

Maybe a better name would be DECOMMISSIONED_AT_END_COMPARATOR or something? It's a bit long
but not often used and clearer what it does.

- spurious whitespace change on setDatanodeDead() function and javadoc for handleHeartbeat

- in generateNodesList, the word decommissioned is misspelled at one point with too few 's'es
- in MiniDFSCluster.setupDatanodeAddress, you can use conf.getTrimmed instead of manually
calling trim()
- the getFreeSocketPort() trick seems like it's not likely to work repeatably - isn't there
a high likelihood that two datanodes would pick the same free port, since you don't track
"claimed" ports anywhere? Or that one of these ports might later get claimed by one of the
many other daemons running on ephemeral ports in a mini cluster?
- when the MiniDFS cluster is constructed, shouldn't you clear out the dfs.hosts file? Otherwise
you're relying on the test case itself to clean itself up between runs (which differs from
the rest of minidfs's storage handling)
- in the test case verifyStats method, it seems we should sleep for at least some number of
millis, or write a function which will wait for heartbeats (eg like TestDatanodeRegistration.java:62).
Otherwise the 10 quick iterations might run before any heartbeats actually came in.
- is there a test case anywhere that covers what happens when a decom node connects to the
namenode? eg after a NN restart when a node is in both include and decom?

> Improve decommission mechanism
> ------------------------------
>
>                 Key: HDFS-1547
>                 URL: https://issues.apache.org/jira/browse/HDFS-1547
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.23.0
>
>         Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.3.patch, HDFS-1547.patch,
show-stats-broken.txt
>
>
> Current decommission mechanism driven using exclude file has several issues. This bug
proposes some changes in the mechanism for better manageability. See the proposal in the next
comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message