hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-10206) getBlockLocations might not sort datanodes properly by distance
Date Thu, 24 Mar 2016 17:05:25 GMT
Ming Ma created HDFS-10206:

             Summary: getBlockLocations might not sort datanodes properly by distance
                 Key: HDFS-10206
                 URL: https://issues.apache.org/jira/browse/HDFS-10206
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Ming Ma

If the DFSClient machine is not a datanode, but it shares its rack with some datanodes of
the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} might not put the local-rack
datanodes at the beginning of the sorted list. That is because the function didn't call {{networktopology.add(client);}}
to properly set the node's parent node; something required by {{networktopology.sortByDistance}}
to compute distance between two nodes in the same topology tree.

Another issue with {{networktopology.sortByDistance}} is it only distinguishes local rack
from remote rack, but it doesn't support general distance calculation to tell how remote the
rack is.

  protected int getWeight(Node reader, Node node) {
    // 0 is local, 1 is same rack, 2 is off rack
    // Start off by initializing to off rack
    int weight = 2;
    if (reader != null) {
      if (reader.equals(node)) {
        weight = 0;
      } else if (isOnSameRack(reader, node)) {
        weight = 1;
    return weight;

HDFS-10203 has suggested moving the sorting from namenode to DFSClient to address another
issue. Regardless of where we do the sorting, we still fix the issues outline here.

This message was sent by Atlassian JIRA

View raw message