Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of vajjalak009@gmail.com
 designates 74.125.82.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALr1C9okVQmqjm4cf2Q7Sciw1n1-uYfXMfq5p6KJSzTYRwHXEg@mail.gmail.com>
References: 
 <CANnz39L-bSS-KqYn_OQNae8V8D+Rv0aAiJA+x2YZEmgB7D2nhw@mail.gmail.com>
	<2013041107282798808611@qunar.com>
	<CAAu13zH-vnDr0oTY1iCZhzs6nbg38hspCyT7dYni=68AyYu4Nw@mail.gmail.com>
	<CANnz39LEz2Yo_qvgQ964SY1pspChX4Ks7Z+zKFgRvHAeN2Yg3g@mail.gmail.com>
	<CALr1C9okVQmqjm4cf2Q7Sciw1n1-uYfXMfq5p6KJSzTYRwHXEg@mail.gmail.com>
Date: Wed, 10 Apr 2013 21:12:12 -0700
Message-ID: 
 <CANnz39KRpjbPCHT4YUssFn1G46BWdGhX8hTwWygJcSE-wLmiCA@mail.gmail.com>
Subject: Re: Copy Vs DistCP
From: KayVajj <vajjalak009@gmail.com>
To: "common-user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=001a11c1b9a8b5026204da0dfd8c

--001a11c1b9a8b5026204da0dfd8c
Content-Type: text/plain; charset=GB2312
Content-Transfer-Encoding: quoted-printable

If CP command is not parallel how does it work for a file partitioned on
various data nodes?


On Wed, Apr 10, 2013 at 6:30 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:

> CP command is not parallel, It's just call FileSystem, even if DFSClient
> has multi threads.
>
> DistCp can work well on the same cluster.
>
>
> On Thu, Apr 11, 2013 at 8:17 AM, KayVajj <vajjalak009@gmail.com> wrote:
>
>> The File System Copy utility copies files byte by byte if I'm not wrong.
>> Could it be possible that the cp command works with blocks and moves the=
m
>> which could be significantly efficient?
>>
>>
>> Also how does the cp command work if the file is distributed on differen=
t
>> data nodes??
>>
>> Thanks
>> Kay
>>
>>
>> On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas <jayunit100@gmail.com> wrote:
>>
>>> DistCP is a full blown mapreduce job (mapper only, where the mappers do
>>> a "fully" parallel copy to the detsination).
>>>
>>> CP appears (correct me if im wrong) to simply invoke the FileSystem and
>>> issues a copy command for every source file.
>>>
>>> I have an additional question: how is CP which is internal to a cluster
>>> optimized (if at all) ?
>>>
>>>
>>>
>>> On Wed, Apr 10, 2013 at 7:28 PM, =C2=F3=CA=F7=C8=D9 <shurong.mai@qunar.=
com> wrote:
>>>
>>>> **
>>>> Hi=A3=AC
>>>>
>>>> I think it' better using Copy in the same cluster while using distCP
>>>> between clusters, and cp command is a hadoop internal parallel process=
 and
>>>> will not copy files locally.
>>>>
>>>> ------------------------------
>>>>  =C2=F3=CA=F7=C8=D9
>>>>
>>>>  *From:* KayVajj <vajjalak009@gmail.com>
>>>> *Date:* 2013-04-11 06:20
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* Copy Vs DistCP
>>>>       I have few questions regarding the usage of DistCP for copying
>>>> files in the same cluster.
>>>>
>>>>
>>>> 1) Which one is better within a  same cluster and what factors (like
>>>> file size etc) wouldinfluence the usage of one over te other?
>>>>
>>>>  2) when we run a cp command like below from a  client node of the
>>>> cluster (not a data node), How does the cp command work
>>>>       i) like an MR job
>>>>      ii) copy files locally and then it copy it back at the new
>>>> location.
>>>>
>>>>  Example of the copy command
>>>>
>>>>  hdfs dfs -cp /<some_location>/file /<new_location>/
>>>>
>>>>  Thanks, your responses are appreciated.
>>>>
>>>>  -- Kay
>>>>
>>>
>>>
>>>
>>> --
>>> Jay Vyas
>>> http://jayunit100.blogspot.com
>>>
>>
>>
>

--001a11c1b9a8b5026204da0dfd8c
Content-Type: text/html; charset=GB2312
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">If CP command is not parallel how does it work for a file =
partitioned on various data nodes? <br></div><div class=3D"gmail_extra"><br=
><br><div class=3D"gmail_quote">On Wed, Apr 10, 2013 at 6:30 PM, Azuryy Yu =
<span dir=3D"ltr">&lt;<a href=3D"mailto:azuryyyu@gmail.com" target=3D"_blan=
k">azuryyyu@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>CP command is not para=
llel, It&#39;s just call FileSystem, even if DFSClient has multi threads.<b=
r>
<br></div>DistCp can work well on the same cluster.<br></div><div class=3D"=
HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br><div class=3D"=
gmail_quote">
On Thu, Apr 11, 2013 at 8:17 AM, KayVajj <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:vajjalak009@gmail.com" target=3D"_blank">vajjalak009@gmail.com</a>&gt=
;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">

<div dir=3D"ltr"><div>The File System Copy utility copies files byte by byt=
e if I&#39;m not wrong. Could it be possible that the cp command works with=
 blocks and moves them which could be significantly efficient? <br><br><br>


</div><div>Also how does the cp command work if the file is distributed on =
different data nodes??<br><br></div><div>Thanks<br></div><div>Kay<br></div>=
</div><div><div><div class=3D"gmail_extra"><br>
<br><div class=3D"gmail_quote">On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:jayunit100@gmail.com" target=3D"_blan=
k">jayunit100@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>DistCP is a full =
blown mapreduce job (mapper only, where the mappers do a &quot;fully&quot; =
parallel copy to the detsination).&nbsp; <br>


<br>CP appears (correct me if im wrong) to simply invoke the FileSystem and=
 issues a copy command for every source file.<br>
<br></div>I have an additional question: how is CP which is internal to a c=
luster optimized (if at all) ? <br></div><div><br></div><div></div></div><d=
iv class=3D"gmail_extra"><div><div><br><br><div class=3D"gmail_quote">
On Wed, Apr 10, 2013 at 7:28 PM, =C2=F3=CA=F7=C8=D9 <span dir=3D"ltr">&lt;<=
a href=3D"mailto:shurong.mai@qunar.com" target=3D"_blank">shurong.mai@qunar=
.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><u></u>


<div style=3D"MARGIN:10px">
<div>Hi=A3=AC</div>
<div>&nbsp;</div>
<div>I think it&#39; better using&nbsp;Copy in the same cluster while using=
 distCP between clusters, and cp command is a hadoop internal parallel proc=
ess and will not copy files locally.</div>
<div>&nbsp;</div>
<hr style=3D"WIDTH:210px;min-height:1px" align=3D"left" color=3D"#b5c4df" s=
ize=3D"1">
<div><span></span></div>
<div>=C2=F3=CA=F7=C8=D9</div>
<div>&nbsp;</div>
<div style=3D"BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOT=
TOM:0cm;PADDING-LEFT:0cm;PADDING-RIGHT:0cm;BORDER-TOP:#b5c4df 1pt solid;BOR=
DER-RIGHT:medium none;PADDING-TOP:3pt">
<div style=3D"padding-right:8px;padding-left:8px;padding-top:8px;font-size:=
12px;background:#efefef;padding-bottom:8px">
<div><b>From:</b>&nbsp;<a href=3D"mailto:vajjalak009@gmail.com" target=3D"_=
blank">KayVajj</a></div>
<div><b>Date:</b>&nbsp;<a href=3D"tel:2013-04-11%C2%A006" value=3D"+1201304=
1106" target=3D"_blank">2013-04-11&nbsp;06</a>:20</div>
<div><b>To:</b>&nbsp;<a href=3D"mailto:user@hadoop.apache.org" target=3D"_b=
lank">user@hadoop.apache.org</a></div>
<div><b>Subject:</b>&nbsp;Copy Vs DistCP</div>
</div>
</div><div><div>
<div>
<div>
<div dir=3D"ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>I have few questions regarding the usage of DistCP for copying files i=
n the same cluster.<br>
<br>
<br>
1) Which one is better within a&nbsp; same cluster and what factors (like f=
ile size etc) wouldinfluence the usage of one over te other?<br>
<br>
</div>
2) when we run a cp command like below from a&nbsp; client node of the clus=
ter (not a data node), How does the cp command work<br>
</div>
&nbsp;&nbsp;&nbsp;&nbsp; i) like an MR job<br>
</div>
&nbsp;&nbsp;&nbsp; ii) copy files locally and then it copy it back at the n=
ew location.<br>
<br>
</div>
Example of the copy command <br>
<br>
</div>
hdfs dfs -cp /&lt;some_location&gt;/file /&lt;new_location&gt;/<br>
<br>
</div>
Thanks, your responses are appreciated.<br>
<br>
</div>
-- Kay<br>
</div>
</div>
</div>
</div></div></div>

</blockquote></div><br><br clear=3D"all"><br></div></div><span><font color=
=3D"#888888">-- <br>Jay Vyas<br><a href=3D"http://jayunit100.blogspot.com" =
target=3D"_blank">http://jayunit100.blogspot.com</a>
</font></span></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a11c1b9a8b5026204da0dfd8c--