Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: <D2B82228.374EE%cnauroth@hortonworks.com>
References: 
 <CAKqDWF1CMGJJD7-HDG3AiaqL5z6XiUvP1Wv6W-P+m+sD-8Ywjw@mail.gmail.com>
 <D2B5CB5E.37405%cnauroth@hortonworks.com>
 <CADOUsiT0U9bZ47ZN4yYiPJEUTieC8D3YhcrQJGiXARJxW1BfBQ@mail.gmail.com>
 <D2B82228.374EE%cnauroth@hortonworks.com>
From: Gavin Yue <yue.yuanyuan@gmail.com>
Date: Sun, 10 Jan 2016 15:05:15 -0800
Message-ID: 
 <CAKqDWF0o-1uVZ=ZN6XqZw9a1mgv5kOYquLnZcb=KiFw22V8q9Q@mail.gmail.com>
Subject: Re: how to quickly fs -cp dir with thousand files?
To: Chris Nauroth <cnauroth@hortonworks.com>
Cc: sandeep vura <sandeepvura@gmail.com>,
	"user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=047d7b414318988903052902db87

--047d7b414318988903052902db87
Content-Type: text/plain; charset=UTF-8

Yes. I need two different copy. And  I tried Chris's solution, distcp
indeed works.
Thank you all

On Sun, Jan 10, 2016 at 3:00 PM, Chris Nauroth <cnauroth@hortonworks.com>
wrote:

> Yes, certainly, if you only need it in one spot, then -mv is a fast
> metadata-only operation.  I was under the impression that Gavin really
> wanted to achieve 2 distinct copies.  Perhaps I was mistaken.
>
> --Chris Nauroth
>
> From: sandeep vura <sandeepvura@gmail.com>
> Date: Sunday, January 10, 2016 at 6:23 AM
> To: Chris Nauroth <cnauroth@hortonworks.com>
> Cc: Gavin Yue <yue.yuanyuan@gmail.com>, "user@hadoop.apache.org" <
> user@hadoop.apache.org>
> Subject: Re: how to quickly fs -cp dir with thousand files?
>
> Hi Chris,
>
> Instead of copying files . Use mv command .
>
>
>    - hadoop fs -mv /user/hadoop/file1 /user/hadoop/file2
>
>
> Sandeep.v
>
>
> On Sat, Jan 9, 2016 at 9:55 AM, Chris Nauroth <cnauroth@hortonworks.com>
> wrote:
>
>> DistCp is capable of running large copies like this in distributed
>> fashion, implemented as a MapReduce job.
>>
>> http://hadoop.apache.org/docs/r2.7.1/hadoop-distcp/DistCp.html
>>
>> A lot of the literature on DistCp talks about use cases for copying
>> across different clusters, but it's also completely legitimate to run
>> DistCp within the same cluster.
>>
>> --Chris Nauroth
>>
>> From: Gavin Yue <yue.yuanyuan@gmail.com>
>> Date: Friday, January 8, 2016 at 4:45 PM
>> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>> Subject: how to quickly fs -cp dir with thousand files?
>>
>> I want to cp a dir with over 8000 files to another dir in the same hdfs.
>> but the copy process is really slow since it is copying one by one.
>> Is there a fast way to copy this using Java FileSystem or FileUtil api?
>>
>> Thanks.
>>
>>
>

--047d7b414318988903052902db87
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Yes. I need two different copy. And=C2=A0 I tried Chr=
is&#39;s solution, distcp indeed works.=C2=A0 <br></div>Thank you all <br><=
/div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Sun, Jan =
10, 2016 at 3:00 PM, Chris Nauroth <span dir=3D"ltr">&lt;<a href=3D"mailto:=
cnauroth@hortonworks.com" target=3D"_blank">cnauroth@hortonworks.com</a>&gt=
;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">


<div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-fam=
ily:Calibri,sans-serif">
<div>
<div>
<div>Yes, certainly, if you only need it in one spot, then -mv is a fast me=
tadata-only operation.=C2=A0 I was under the impression that Gavin really w=
anted to achieve 2 distinct copies.=C2=A0 Perhaps I was mistaken.</div>
<div><br>
</div>
<div><font color=3D"#000000"><font face=3D"Calibri">--Chris Nauroth</font><=
/font></div>
</div>
</div>
<div><br>
</div>
<span>
<div style=3D"font-family:Calibri;font-size:11pt;text-align:left;color:blac=
k;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADD=
ING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:me=
dium none;PADDING-TOP:3pt">
<span style=3D"font-weight:bold">From: </span>sandeep vura &lt;<a href=3D"m=
ailto:sandeepvura@gmail.com" target=3D"_blank">sandeepvura@gmail.com</a>&gt=
;<br>
<span style=3D"font-weight:bold">Date: </span>Sunday, January 10, 2016 at 6=
:23 AM<br>
<span style=3D"font-weight:bold">To: </span>Chris Nauroth &lt;<a href=3D"ma=
ilto:cnauroth@hortonworks.com" target=3D"_blank">cnauroth@hortonworks.com</=
a>&gt;<br>
<span style=3D"font-weight:bold">Cc: </span>Gavin Yue &lt;<a href=3D"mailto=
:yue.yuanyuan@gmail.com" target=3D"_blank">yue.yuanyuan@gmail.com</a>&gt;, =
&quot;<a href=3D"mailto:user@hadoop.apache.org" target=3D"_blank">user@hado=
op.apache.org</a>&quot; &lt;<a href=3D"mailto:user@hadoop.apache.org" targe=
t=3D"_blank">user@hadoop.apache.org</a>&gt;<br>
<span style=3D"font-weight:bold">Subject: </span>Re: how to quickly fs -cp =
dir with thousand files?<br>
</div><div><div class=3D"h5">
<div><br>
</div>
<div>
<div>
<div dir=3D"ltr">Hi Chris,
<div><br>
</div>
<div>Instead of copying files . Use mv command .</div>
<div><br>
</div>
<div>
<ul style=3D"color:rgb(0,0,0);font-family:Verdana,Helvetica,Arial,sans-seri=
f;font-size:13px">
<li style=3D"font-size:12px;color:rgb(51,51,51)"><tt>hadoop fs -mv /user/ha=
doop/file1 /user/hadoop/file2</tt></li></ul>
<div><font color=3D"#333333" face=3D"monospace"><span style=3D"font-size:12=
px"><br>
</span></font></div>
<div><font color=3D"#333333" face=3D"monospace"><span style=3D"font-size:12=
px">Sandeep.v</span></font></div>
</div>
<div><font color=3D"#333333" face=3D"monospace"><span style=3D"font-size:12=
px"><br>
</span></font></div>
</div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Sat, Jan 9, 2016 at 9:55 AM, Chris Nauroth <s=
pan dir=3D"ltr">
&lt;<a href=3D"mailto:cnauroth@hortonworks.com" target=3D"_blank">cnauroth@=
hortonworks.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-fam=
ily:Calibri,sans-serif">
<div>
<div>DistCp is capable of running large copies like this in distributed fas=
hion, implemented as a MapReduce job.</div>
<div><br>
</div>
<div><a href=3D"http://hadoop.apache.org/docs/r2.7.1/hadoop-distcp/DistCp.h=
tml" target=3D"_blank">http://hadoop.apache.org/docs/r2.7.1/hadoop-distcp/D=
istCp.html</a></div>
<div><br>
</div>
<div>A lot of the literature on DistCp talks about use cases for copying ac=
ross different clusters, but it&#39;s also completely legitimate to run Dis=
tCp within the same cluster.</div>
<div><br>
</div>
<div><font color=3D"#000000"><font face=3D"Calibri">--Chris Nauroth</font><=
/font></div>
</div>
<div><br>
</div>
<span>
<div style=3D"font-family:Calibri;font-size:11pt;text-align:left;color:blac=
k;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADD=
ING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:me=
dium none;PADDING-TOP:3pt">
<span style=3D"font-weight:bold">From: </span>Gavin Yue &lt;<a href=3D"mail=
to:yue.yuanyuan@gmail.com" target=3D"_blank">yue.yuanyuan@gmail.com</a>&gt;=
<br>
<span style=3D"font-weight:bold">Date: </span>Friday, January 8, 2016 at 4:=
45 PM<br>
<span style=3D"font-weight:bold">To: </span>&quot;<a href=3D"mailto:user@ha=
doop.apache.org" target=3D"_blank">user@hadoop.apache.org</a>&quot; &lt;<a =
href=3D"mailto:user@hadoop.apache.org" target=3D"_blank">user@hadoop.apache=
.org</a>&gt;<br>
<span style=3D"font-weight:bold">Subject: </span>how to quickly fs -cp dir =
with thousand files?<br>
</div>
<span>
<div><br>
</div>
<div>
<div>
<div dir=3D"ltr">
<div>
<div>I want to cp a dir with over 8000 files to another dir in the same hdf=
s.=C2=A0 but the copy process is really slow since it is copying one by one=
.
<br>
</div>
Is there a fast way to copy this using Java FileSystem or FileUtil api? <br=
>
<br>
</div>
Thanks.<br>
<br>
</div>
</div>
</div>
</span></span></div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div></div></span>
</div>

</blockquote></div><br></div>

--047d7b414318988903052902db87--