hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 남윤민 <rony...@dgist.ac.kr>
Subject Does "copyToLocal" not consider the block locality?
Date Sat, 21 Mar 2015 08:53:17 GMT
 Hello everyone.

I have experienced a very strange situation
about HDFS operation.


I have a 1 master and 10 slaves cluster


When I put a file A into HDFS with
dfs.replication=10, I can see every block of the file A is replicated in every

So, it is reasonable to think that HDFS
file reader can operate as local block reader when I want to read that file A.


However, when I execute hdfs dfs –copyToLocal
A /to/my/localDir, the file reading time is same as the case of


So, I moniter the network resources
especially read and write data.

Both two cases that dfs.replication={1, 10}
fully exploit network resources.. 

This means reading that file does not
consider the block location..


Is it reasonable operation of HDFS?

Then, what is the true meaning of data
locality in HDFS? (We all know about the data locality of map task..)


I want to know the reason of the same performance
between both two “copyToLocal” cases.



// Yoonmin Nam

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message