hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: CopyFromLocal
Date Tue, 22 May 2012 04:09:22 GMT
Ranjith,

MapReduce and HDFS are two different things. MapReduce uses HDFS (and
can use any other FS as well) to do some efficient work, but HDFS does
not use MapReduce.

A simple HDFS transfer is done via network directly - Yes its just a
block by block copy/write to/from the relevant DataNodes, done over
network sockets at each end.

On Tue, May 22, 2012 at 8:58 AM, Ranjith <ranjith.raghunath1@gmail.com> wrote:
> Thanks harsh. So when it connects directly to the data nodes it does not fire off any
mappers. So how does it get the data over? Is it just a block by block copy?
>
> Thanks,
> Ranjith
>
> On May 21, 2012, at 9:22 PM, Harsh J <harsh@cloudera.com> wrote:
>
>> Ranjith,
>>
>> Are you speaking of DistCp?
>> http://hadoop.apache.org/common/docs/current/distcp.html
>>
>> An 'fs -copyFromLocal' otherwise just runs as a single program that
>> connects to your DFS nodes and writes data from a single client
>> thread, and is not distributed on its own.
>>
>> On Tue, May 22, 2012 at 6:48 AM, Ranjith <ranjith.raghunath1@gmail.com> wrote:
>>>
>>> I have always wondered about this and and not sure as to phenomenon. When I fire
a map reduce job to copy data over in a distributed fashion I would expect to see mappers
executing the copy. What happens with a copy command from Hadoop fs?
>>>
>>> Thanks,
>>> Ranjith
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Mime
View raw message