hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: performance of "hadoop fs -put"
Date Wed, 29 Jan 2014 02:22:22 GMT
Are you calling one command per file? That's bound to be slow as it invokes
a new JVM each time.
On Jan 29, 2014 7:15 AM, "Jay Vyas" <jayunit100@gmail.com> wrote:

> Im finding that "hadoop fs -put" on a cluster is quite slow for me when i
> have large amounts of small files... much slower than native file ops.
> Note that Im using the RawLocalFileSystem as the underlying backing
> filesystem that is being written to in this case, so HDFS isnt the issue.
>
> I see that the Put class creates a linkedlist of # number of elements in
> the path.
>
> 1) Is there a more performant way to run "fs -put"
>
> 2) Has anyone else noted that "fs -put" has extra overhead?
>
> Im going to trace some more but , just wanted to bounce this off the
> mailing list... maybe others also have run into this issue.
>
> ** Is "hadoop fs -put" inherently slower than a unix "cp"action,
> regardless of filesystem -- and if so , why? **
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Mime
View raw message