hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Booth <jaybo...@gmail.com>
Subject Re: Local sockets
Date Mon, 06 Dec 2010 21:39:32 GMT
Well, if the goal is to minimize seeks, I'd recommend looking at
https://issues.apache.org/jira/browse/HDFS-1034 before trying to implement a
general I/O scheduler.

On the general front..  my first pass at what to do would turn out to be
exactly what current elevators do.  We're already sending 64kb transferTo
requests which delegate to splice under the hood for 0-copy and attempt to
push 64kb at a time.  The existing elevator does a pretty good job with them
(ordering them across the disk and doing the
brief-wait-in-kernel-mode-for-another-read thing).  You could get more
aggressive than 64kb and maybe reach a higher aggregate throughput, but I
think you'd rapidly wind up with either full socket send buffers, or some
tasks starved for input while others have a bunch queued up..  it could
easily wind up being slower.

That said, I wouldn't want to discourage development if you were thinking
about building something like that..  even if it doesn't wind up being
faster, it could be educational for the community to figure out why it
wasn't faster.

On Mon, Dec 6, 2010 at 3:36 PM, Rajappa Iyer <rsi@panix.com> wrote:

> Jay Booth <jaybooth@gmail.com> writes:
> > On Mon, Dec 6, 2010 at 3:13 PM, Rajappa Iyer <rsi@panix.com> wrote:
> >> What Vladimir is talking about is reducing the seek times by essentially
> >> serializing the reads through a single thread per disk.  You could
> >> either cleverly reorganize the reads so that seek is minimized and/or
> >> read the entire block in one call.
> > I think that modern kernel and elevator implementations are in a better
> > place to make this decision than Hadoop most of the time.  I'd be worried
> > about a lot of work going into an implementation that saves a little work
> > some of the time and loses a bunch the rest of the time.  The existing
> > elevator algorithms are pretty good, and they're written in
> > super-duper-optimized C and run in kernel mode..  kinda hard to compete
> > with, and even if we do, how do we know we wouldn't wind up working
> against
> > them?
> HDFS block sizes are large -- any I/O scheduler that optimizes access to
> this will of necessity have to severely penalize other I/O bound
> processes, which would probably be unacceptable for a general I/O
> scheduler in the OS.  But this would be perfectly acceptable at the user
> level for the datanode where not many other jobs are running.
> As Todd points out though, most Hadoop installations run on Linux, so it
> would definitely be worthwhile characterizing this behavior on Linux.
> -rsi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message