hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajappa Iyer <...@panix.com>
Subject Re: Local sockets
Date Mon, 06 Dec 2010 20:36:17 GMT
Jay Booth <jaybooth@gmail.com> writes:

> On Mon, Dec 6, 2010 at 3:13 PM, Rajappa Iyer <rsi@panix.com> wrote:

>> What Vladimir is talking about is reducing the seek times by essentially
>> serializing the reads through a single thread per disk.  You could
>> either cleverly reorganize the reads so that seek is minimized and/or
>> read the entire block in one call.

> I think that modern kernel and elevator implementations are in a better
> place to make this decision than Hadoop most of the time.  I'd be worried
> about a lot of work going into an implementation that saves a little work
> some of the time and loses a bunch the rest of the time.  The existing
> elevator algorithms are pretty good, and they're written in
> super-duper-optimized C and run in kernel mode..  kinda hard to compete
> with, and even if we do, how do we know we wouldn't wind up working against
> them?

HDFS block sizes are large -- any I/O scheduler that optimizes access to
this will of necessity have to severely penalize other I/O bound
processes, which would probably be unacceptable for a general I/O
scheduler in the OS.  But this would be perfectly acceptable at the user
level for the datanode where not many other jobs are running.

As Todd points out though, most Hadoop installations run on Linux, so it
would definitely be worthwhile characterizing this behavior on Linux.

-rsi

Mime
View raw message