hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: HDFS network traffic
Date Thu, 07 Mar 2013 05:23:17 GMT
Yes, the simple copy is a client operation. Client reads bytes from
source and writes to the destination, thereby being in control of
failures, etc.. However, if you want your cluster to do the copy (and
if the copy is a big set), consider using the DistCp
(distributed-copy) MR job to do it.

On Thu, Mar 7, 2013 at 9:51 AM, Bill Q <bill.q.hdp@gmail.com> wrote:
> Hi All,
> I am working on converting a sequence file to mapfile and just discovered
> something I wasn't aware of.
> For example, suppose I am working on a 2-node cluster, one
> master/namenode/datanode, one slave/datanode. If I do hadoop dfs -cp
> /data/file1 /data/file2 (a 1G file) from the master, and monitor the NIC of
> both nodes, I saw that the master node send the entire file of 1G traffic to
> the slave. This surprised me. Does this mean all the traffic has to go
> through the client node that runs the command (in this case, the master)
> when I do hadoop dfs -cp?
> Many thanks.
> Bill

Harsh J

View raw message