hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jianan hu <hujia...@gmail.com>
Subject Rack awareness and pipeline write
Date Sun, 11 May 2014 02:55:33 GMT
Hi everyone,

See HDFS documents, It says "For the common case, when the replication
factor is three, HDFS’s placement policy is to put one replica on one node
in the local rack, another on a node in a different (remote) rack, and the
last on a different node in the same remote rack."

Assume there are two racks A and B. According to rack awareness, the first
block is put in rack A, and the the other two replicated blocks will be
pushed into rack B.

However, why not store the first and second replicas in the local rack (A),
and the last in a different remote rack (B)? Both two scenarios have same
network traffic. What's the disadvantage of it?


Best Regards,

View raw message