hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: Copy Vs DistCP
Date Thu, 11 Apr 2013 12:44:48 GMT
Yes makes sense...  cp is serialized and simpler, and does not rely on jobtracker- Whereas
distcp actually only submits a job and waits for completion.  
So it can fail if tasks start to fail or timeout. 
 I Have seen distcp fail and hang before albeit not often.

Sent from my iPhone

On Apr 10, 2013, at 10:37 PM, Alexander Pivovarov <apivovarov@gmail.com> wrote:

> if cluster is busy with other jobs distcp will wait for free map slots. Regular cp is
more reliable and predictable. Especialy if you need to copy just several GB
> On Apr 10, 2013 6:31 PM, "Azuryy Yu" <azuryyyu@gmail.com> wrote:
>> CP command is not parallel, It's just call FileSystem, even if DFSClient has multi
>> DistCp can work well on the same cluster.
>> On Thu, Apr 11, 2013 at 8:17 AM, KayVajj <vajjalak009@gmail.com> wrote:
>>> The File System Copy utility copies files byte by byte if I'm not wrong. Could
it be possible that the cp command works with blocks and moves them which could be significantly
>>> Also how does the cp command work if the file is distributed on different data
>>> Thanks
>>> Kay
>>> On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas <jayunit100@gmail.com> wrote:
>>>> DistCP is a full blown mapreduce job (mapper only, where the mappers do a
"fully" parallel copy to the detsination).  
>>>> CP appears (correct me if im wrong) to simply invoke the FileSystem and issues
a copy command for every source file.
>>>> I have an additional question: how is CP which is internal to a cluster optimized
(if at all) ? 
>>>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 <shurong.mai@qunar.com>
>>>>> Hi,
>>>>> I think it' better using Copy in the same cluster while using distCP
between clusters, and cp command is a hadoop internal parallel process and will not copy files
>>>>> 麦树荣
>>>>> From: KayVajj
>>>>> Date: 2013-04-11 06:20
>>>>> To: user@hadoop.apache.org
>>>>> Subject: Copy Vs DistCP
>>>>> I have few questions regarding the usage of DistCP for copying files
in the same cluster.
>>>>> 1) Which one is better within a  same cluster and what factors (like
file size etc) wouldinfluence the usage of one over te other?
>>>>> 2) when we run a cp command like below from a  client node of the cluster
(not a data node), How does the cp command work
>>>>>      i) like an MR job
>>>>>     ii) copy files locally and then it copy it back at the new location.
>>>>> Example of the copy command 
>>>>> hdfs dfs -cp /<some_location>/file /<new_location>/
>>>>> Thanks, your responses are appreciated.
>>>>> -- Kay
>>>> -- 
>>>> Jay Vyas
>>>> http://jayunit100.blogspot.com

View raw message