hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12008) Improve the available-space block placement policy
Date Thu, 22 Jun 2017 21:13:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060016#comment-16060016

Kihwal Lee commented on HDFS-12008:

When you set the conf to balance the space all the time (1.0f), the underutilized nodes should
be picked about 75% of times. In 25% of times, the ones with less free space are picked twice
(p=0.5*0.5), so end result is picking a node with less free space.  In branch-2.8, no matter
what you set the probability to, the result comes out 50%.

I suspect this is because the specified scope  when {{chooseDataNode()}} is called. The test
setup happens to make it so that a rack is full of either 100% free nodes or 50% free nodes,
never mixed.  So, when two nodes are picked with a given rack scope, it can only pick one
kind, both 100% free or both 50% free.  So the random factor doesn't really matter and the
chance of picking underutilized nodes (100% free) becomes exactly 50%, the percentage of such
nodes in the cluster.

If you change the number of racks to an odd number or change the way a rack is assigned to
each node, the chance of picking underutilized nodes rises to over 70%, closer to the theoretical
75%.  So the test was wrong and it was also checking against a wrong result.

Now, on trunk, the behavior is different. I haven't looked in detail, but it indicates the
scope is specified differently for {{chooseDataNode()}} compared to branch-2.8.  I can see
two nodes from different racks are getting picked from within a {{chooseDataNode()}} call.
 If the scope is same as before, it must be {{DFSNetworkTopology}} not honoring the scope.
 Either way, the behavior is different from branch-2.8.

> Improve the available-space block placement policy
> --------------------------------------------------
>                 Key: HDFS-12008
>                 URL: https://issues.apache.org/jira/browse/HDFS-12008
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: block placement
>    Affects Versions: 2.8.1
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>         Attachments: HDFS-12008.patch
> AvailableSpaceBlockPlacementPolicy currently picks two nodes unconditionally, then picks
one node. It could avoid picking the second node when not necessary.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message