hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "BELUGA BEHR (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-13156) HDFS Block Placement Policy - Client Local Rack
Date Fri, 16 Feb 2018 16:05:00 GMT
BELUGA BEHR created HDFS-13156:
----------------------------------

             Summary: HDFS Block Placement Policy - Client Local Rack
                 Key: HDFS-13156
                 URL: https://issues.apache.org/jira/browse/HDFS-13156
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: documentation
    Affects Versions: 2.9.0
            Reporter: BELUGA BEHR


{quote}For the common case, when the replication factor is three, HDFS’s placement policy
is to put one replica on the local machine if the writer is on a datanode, otherwise on a
random datanode, another replica on a node in a different (remote) rack, and the last on a
different node in the same remote rack.
{quote}
[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Replica_Placement:_The_First_Baby_Steps]

Having just looked over the Default Block Placement code, the way I understand this, is that,
there are three basic scenarios:
 # HDFS client is running on a datanode inside the cluster
 # HDFS client is running on a node outside the cluster
 # HDFS client is running on a non-datanode inside the cluster

The documentation is ambiguous concerning the third scenario. Please correct me if I'm wrong,
but the way I understand the code, if there is an HDFS client inside the cluster, but it is
not on a datanode, the first block will be placed on a datanode within the set of datanodes
available on the local rack and not simply on any _random datanode_ from the set of all datanodes
in the cluster.

That is to say, if one rack has an HDFS Sink Flume Agent on a dedicated node, I should expect
that every first block will be written to a _random datanode_ on the same rack as the HDFS
Flume agent, assuming the network topology script is written to include this Flume node.

If that is correct, can the documentation be updated to include this third common scenario?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message