hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elton sky <eltonsky9...@gmail.com>
Subject Re: Why single thread for HDFS?
Date Tue, 06 Jul 2010 03:46:34 GMT
>Basically, your point is that hadoop dfs -cp is relatively slow and could
be made faster.  If HDFS had a more multi-threaded >design, itwould make cp
operations faster.
What I mean is, if we have the size of a file we can parallel by calculating
blocks. Otherwise we couldn't.


On Tue, Jul 6, 2010 at 10:47 AM, Allen Wittenauer
<awittenauer@linkedin.com>wrote:

>
> On Jul 5, 2010, at 5:01 PM, elton sky wrote:
> > Well, this sounds good when you have many small files, you concat() them
> > into a big one. I am talking about split a big file into blocks and copy
> all
> > a few blocks in parallel.
>
> Basically, your point is that hadoop dfs -cp is relatively slow and could
> be made faster.  If HDFS had a more multi-threaded design, it would make cp
> operations faster.
>
> This sounds like a particularly high cost for an operation that is rarely
> utilized.  [This is much more interesting in a distcp context, but even then
> not that great.  distcp in my experience is usually used to push a bunch of
> files, so you get your parallelism at the file level.  Typically these are
> part files are usually the same approx. size.]
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message