hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: Why single thread for HDFS?
Date Tue, 06 Jul 2010 00:47:46 GMT

On Jul 5, 2010, at 5:01 PM, elton sky wrote:
> Well, this sounds good when you have many small files, you concat() them
> into a big one. I am talking about split a big file into blocks and copy all
> a few blocks in parallel.

Basically, your point is that hadoop dfs -cp is relatively slow and could be made faster.
 If HDFS had a more multi-threaded design, it would make cp operations faster.  

This sounds like a particularly high cost for an operation that is rarely utilized.  [This
is much more interesting in a distcp context, but even then not that great.  distcp in my
experience is usually used to push a bunch of files, so you get your parallelism at the file
level.  Typically these are part files are usually the same approx. size.]

View raw message