hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aajis...@apache.org
Subject hadoop git commit: HDFS-11833. HDFS architecture documentation descibes outdated placement policy. Contributed by Chen Liang.
Date Tue, 16 May 2017 16:25:11 GMT
Repository: hadoop
Updated Branches:
  refs/heads/branch-2 feb7e9212 -> c17cb03a2


HDFS-11833. HDFS architecture documentation descibes outdated placement policy. Contributed
by Chen Liang.

(cherry picked from commit 1d1c52b42feae5a4271ef4b771d0d8de43e83c15)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/c17cb03a
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/c17cb03a
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/c17cb03a

Branch: refs/heads/branch-2
Commit: c17cb03a23f2ea9f1af8f9c147ac68d8441be935
Parents: feb7e92
Author: Akira Ajisaka <aajisaka@apache.org>
Authored: Tue May 16 11:52:33 2017 -0400
Committer: Akira Ajisaka <aajisaka@apache.org>
Committed: Tue May 16 11:53:55 2017 -0400

----------------------------------------------------------------------
 hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/c17cb03a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
index 86acc08..ffcfeb2 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
@@ -102,7 +102,7 @@ Large HDFS instances run on a cluster of computers that commonly spread
across m
 The NameNode determines the rack id each DataNode belongs to via the process outlined in
[Hadoop Rack Awareness](../hadoop-common/RackAwareness.html).
 A simple but non-optimal policy is to place replicas on unique racks. This prevents losing
data when an entire rack fails and allows use of bandwidth from multiple racks when reading
data. This policy evenly distributes replicas in the cluster which makes it easy to balance
load on component failure. However, this policy increases the cost of writes because a write
needs to transfer blocks to multiple racks.
 
-For the common case, when the replication factor is three, HDFS’s placement policy is to
put one replica on one node in the local rack, another on a different node in the local rack,
and the last on a different node in a different rack. This policy cuts the inter-rack write
traffic which generally improves write performance. The chance of rack failure is far less
than that of node failure; this policy does not impact data reliability and availability guarantees.
However, it does reduce the aggregate network bandwidth used when reading data since a block
is placed in only two unique racks rather than three. With this policy, the replicas of a
file do not evenly distribute across the racks. One third of replicas are on one node, two
thirds of replicas are on one rack, and the other third are evenly distributed across the
remaining racks. This policy improves write performance without compromising data reliability
or read performance.
+For the common case, when the replication factor is three, HDFS’s placement policy is to
put one replica on the local machine if the writer is on a datanode, otherwise on a random
datanode, another replica on a node in a different (remote) rack, and the last on a different
node in the same remote rack. This policy cuts the inter-rack write traffic which generally
improves write performance. The chance of rack failure is far less than that of node failure;
this policy does not impact data reliability and availability guarantees. However, it does
reduce the aggregate network bandwidth used when reading data since a block is placed in only
two unique racks rather than three. With this policy, the replicas of a file do not evenly
distribute across the racks. One third of replicas are on one node, two thirds of replicas
are on one rack, and the other third are evenly distributed across the remaining racks. This
policy improves write performance without compromising data reliability or 
 read performance.
 
 If the replication factor is greater than 3,
 the placement of the 4th and following replicas are determined randomly


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org


Mime
View raw message