hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: CopyFromLocal
Date Tue, 22 May 2012 04:09:22 GMT

MapReduce and HDFS are two different things. MapReduce uses HDFS (and
can use any other FS as well) to do some efficient work, but HDFS does
not use MapReduce.

A simple HDFS transfer is done via network directly - Yes its just a
block by block copy/write to/from the relevant DataNodes, done over
network sockets at each end.

On Tue, May 22, 2012 at 8:58 AM, Ranjith <ranjith.raghunath1@gmail.com> wrote:
> Thanks harsh. So when it connects directly to the data nodes it does not fire off any
mappers. So how does it get the data over? Is it just a block by block copy?
> Thanks,
> Ranjith
> On May 21, 2012, at 9:22 PM, Harsh J <harsh@cloudera.com> wrote:
>> Ranjith,
>> Are you speaking of DistCp?
>> http://hadoop.apache.org/common/docs/current/distcp.html
>> An 'fs -copyFromLocal' otherwise just runs as a single program that
>> connects to your DFS nodes and writes data from a single client
>> thread, and is not distributed on its own.
>> On Tue, May 22, 2012 at 6:48 AM, Ranjith <ranjith.raghunath1@gmail.com> wrote:
>>> I have always wondered about this and and not sure as to phenomenon. When I fire
a map reduce job to copy data over in a distributed fashion I would expect to see mappers
executing the copy. What happens with a copy command from Hadoop fs?
>>> Thanks,
>>> Ranjith
>> --
>> Harsh J

Harsh J

View raw message