Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of jayunit100@gmail.com
 designates 209.85.215.50 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr2YVyOMDMe1e3x=x=yDzL-M_hUsJtdnmRokXHEJN3hiLQ@mail.gmail.com>
References: 
 <CAAu13zEd_xyyxjXx0g1aFBJ2KyciJuqyOtnMCxOQap48RDZv+A@mail.gmail.com>
	<CAOcnVr2YVyOMDMe1e3x=x=yDzL-M_hUsJtdnmRokXHEJN3hiLQ@mail.gmail.com>
Date: Wed, 29 Jan 2014 08:52:27 -0500
Message-ID: 
 <CAAu13zE6Hy3Cawa_S_uJG0txPvD53ED7kicHXhwiVJ2kN7P3JA@mail.gmail.com>
Subject: Re: performance of "hadoop fs -put"
From: Jay Vyas <jayunit100@gmail.com>
To: "common-user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=001a1133a85c571a3104f11c40eb

--001a1133a85c571a3104f11c40eb
Content-Type: text/plain; charset=ISO-8859-1

No , im using a glob pattern, its all done in one "put" statement


On Tue, Jan 28, 2014 at 9:22 PM, Harsh J <harsh@cloudera.com> wrote:

> Are you calling one command per file? That's bound to be slow as it
> invokes a new JVM each time.
> On Jan 29, 2014 7:15 AM, "Jay Vyas" <jayunit100@gmail.com> wrote:
>
>> Im finding that "hadoop fs -put" on a cluster is quite slow for me when i
>> have large amounts of small files... much slower than native file ops.
>> Note that Im using the RawLocalFileSystem as the underlying backing
>> filesystem that is being written to in this case, so HDFS isnt the issue.
>>
>> I see that the Put class creates a linkedlist of # number of elements in
>> the path.
>>
>> 1) Is there a more performant way to run "fs -put"
>>
>> 2) Has anyone else noted that "fs -put" has extra overhead?
>>
>> Im going to trace some more but , just wanted to bounce this off the
>> mailing list... maybe others also have run into this issue.
>>
>> ** Is "hadoop fs -put" inherently slower than a unix "cp"action,
>> regardless of filesystem -- and if so , why? **
>>
>>
>> --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

--001a1133a85c571a3104f11c40eb
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">No , im using a glob pattern, its all done in one &quot;pu=
t&quot; statement=A0</div><div class=3D"gmail_extra"><br><br><div class=3D"=
gmail_quote">On Tue, Jan 28, 2014 at 9:22 PM, Harsh J <span dir=3D"ltr">&lt=
;<a href=3D"mailto:harsh@cloudera.com" target=3D"_blank">harsh@cloudera.com=
</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><p dir=3D"ltr">Are you calling one command p=
er file? That&#39;s bound to be slow as it invokes a new JVM each time.</p>=
<div class=3D"HOEnZb">
<div class=3D"h5">
<div class=3D"gmail_quote">On Jan 29, 2014 7:15 AM, &quot;Jay Vyas&quot; &l=
t;<a href=3D"mailto:jayunit100@gmail.com" target=3D"_blank">jayunit100@gmai=
l.com</a>&gt; wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
>

<div dir=3D"ltr"><div>Im finding that &quot;hadoop fs -put&quot; on a clust=
er is quite slow for me when i have large amounts of small files... much sl=
ower than native file ops.=A0 Note that Im using the RawLocalFileSystem as =
the underlying backing filesystem that is being written to in this case, so=
 HDFS isnt the issue.<br>


</div><div><br></div>I see that the Put class creates a linkedlist of # num=
ber of elements in the path.=A0 <br><br>1) Is there a more performant way t=
o run &quot;fs -put&quot;<br><div><div><br></div><div>2) Has anyone else no=
ted that &quot;fs -put&quot; has extra overhead? <br>


<br></div><div>Im going to trace some more but , just wanted to bounce this=
 off the mailing list... maybe others also have run into this issue.=A0 <br=
><br>** Is &quot;hadoop fs -put&quot; inherently slower than a unix &quot;c=
p&quot;action, regardless of filesystem -- and if so , why? ** <br>


<br><br><div>-- <br>Jay Vyas<br><a href=3D"http://jayunit100.blogspot.com" =
target=3D"_blank">http://jayunit100.blogspot.com</a>
</div></div></div></div>
</blockquote></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Jay Vyas<br><a href=3D"http://jayunit100.blogspot.com" target=3D"_blank">ht=
tp://jayunit100.blogspot.com</a>
</div>

--001a1133a85c571a3104f11c40eb--