hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KayVajj <vajjalak...@gmail.com>
Subject Re: Copy Vs DistCP
Date Thu, 11 Apr 2013 04:12:12 GMT
If CP command is not parallel how does it work for a file partitioned on
various data nodes?


On Wed, Apr 10, 2013 at 6:30 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:

> CP command is not parallel, It's just call FileSystem, even if DFSClient
> has multi threads.
>
> DistCp can work well on the same cluster.
>
>
> On Thu, Apr 11, 2013 at 8:17 AM, KayVajj <vajjalak009@gmail.com> wrote:
>
>> The File System Copy utility copies files byte by byte if I'm not wrong.
>> Could it be possible that the cp command works with blocks and moves them
>> which could be significantly efficient?
>>
>>
>> Also how does the cp command work if the file is distributed on different
>> data nodes??
>>
>> Thanks
>> Kay
>>
>>
>> On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas <jayunit100@gmail.com> wrote:
>>
>>> DistCP is a full blown mapreduce job (mapper only, where the mappers do
>>> a "fully" parallel copy to the detsination).
>>>
>>> CP appears (correct me if im wrong) to simply invoke the FileSystem and
>>> issues a copy command for every source file.
>>>
>>> I have an additional question: how is CP which is internal to a cluster
>>> optimized (if at all) ?
>>>
>>>
>>>
>>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 <shurong.mai@qunar.com> wrote:
>>>
>>>> **
>>>> Hi,
>>>>
>>>> I think it' better using Copy in the same cluster while using distCP
>>>> between clusters, and cp command is a hadoop internal parallel process and
>>>> will not copy files locally.
>>>>
>>>> ------------------------------
>>>>  麦树荣
>>>>
>>>>  *From:* KayVajj <vajjalak009@gmail.com>
>>>> *Date:* 2013-04-11 06:20
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* Copy Vs DistCP
>>>>       I have few questions regarding the usage of DistCP for copying
>>>> files in the same cluster.
>>>>
>>>>
>>>> 1) Which one is better within a  same cluster and what factors (like
>>>> file size etc) wouldinfluence the usage of one over te other?
>>>>
>>>>  2) when we run a cp command like below from a  client node of the
>>>> cluster (not a data node), How does the cp command work
>>>>       i) like an MR job
>>>>      ii) copy files locally and then it copy it back at the new
>>>> location.
>>>>
>>>>  Example of the copy command
>>>>
>>>>  hdfs dfs -cp /<some_location>/file /<new_location>/
>>>>
>>>>  Thanks, your responses are appreciated.
>>>>
>>>>  -- Kay
>>>>
>>>
>>>
>>>
>>> --
>>> Jay Vyas
>>> http://jayunit100.blogspot.com
>>>
>>
>>
>

Mime
View raw message