hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: performance of "hadoop fs -put"
Date Wed, 29 Jan 2014 13:52:27 GMT
No , im using a glob pattern, its all done in one "put" statement


On Tue, Jan 28, 2014 at 9:22 PM, Harsh J <harsh@cloudera.com> wrote:

> Are you calling one command per file? That's bound to be slow as it
> invokes a new JVM each time.
> On Jan 29, 2014 7:15 AM, "Jay Vyas" <jayunit100@gmail.com> wrote:
>
>> Im finding that "hadoop fs -put" on a cluster is quite slow for me when i
>> have large amounts of small files... much slower than native file ops.
>> Note that Im using the RawLocalFileSystem as the underlying backing
>> filesystem that is being written to in this case, so HDFS isnt the issue.
>>
>> I see that the Put class creates a linkedlist of # number of elements in
>> the path.
>>
>> 1) Is there a more performant way to run "fs -put"
>>
>> 2) Has anyone else noted that "fs -put" has extra overhead?
>>
>> Im going to trace some more but , just wanted to bounce this off the
>> mailing list... maybe others also have run into this issue.
>>
>> ** Is "hadoop fs -put" inherently slower than a unix "cp"action,
>> regardless of filesystem -- and if so , why? **
>>
>>
>> --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Mime
View raw message