hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject performance of "hadoop fs -put"
Date Wed, 29 Jan 2014 01:45:06 GMT
Im finding that "hadoop fs -put" on a cluster is quite slow for me when i
have large amounts of small files... much slower than native file ops.
Note that Im using the RawLocalFileSystem as the underlying backing
filesystem that is being written to in this case, so HDFS isnt the issue.

I see that the Put class creates a linkedlist of # number of elements in
the path.

1) Is there a more performant way to run "fs -put"

2) Has anyone else noted that "fs -put" has extra overhead?

Im going to trace some more but , just wanted to bounce this off the
mailing list... maybe others also have run into this issue.

** Is "hadoop fs -put" inherently slower than a unix "cp"action, regardless
of filesystem -- and if so , why? **


-- 
Jay Vyas
http://jayunit100.blogspot.com

Mime
View raw message