hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Q <bill.q....@gmail.com>
Subject HDFS network traffic
Date Thu, 07 Mar 2013 04:21:28 GMT
Hi All,
I am working on converting a sequence file to mapfile and just discovered
something I wasn't aware of.

For example, suppose I am working on a 2-node cluster, one
master/namenode/datanode, one slave/datanode. If I do hadoop dfs -cp
/data/file1 /data/file2 (a 1G file) from the master, and monitor the NIC of
both nodes, I saw that the master node send the entire file of 1G traffic
to the slave. This surprised me. Does this mean all the traffic has to go
through the client node that runs the command (in this case, the master)
when I do hadoop dfs -cp?

Many thanks.


View raw message