hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: optimizing for random access
Date Tue, 27 Apr 2010 03:28:41 GMT
On Mon, Apr 26, 2010 at 4:02 PM, Renato MarroquĂ­n Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Hey Todd, by saying that HDFS is able to read just small byte ranges, are
> talking about the capability described in the Bigtable original paper? I
> mean the ability to read just part of a compressed SSTable Block and using
> it in a block cache type of way.
>

Yes, that's what HBase does - our SSTable is called HFile, and the two
formats are very similar.

Regarding memory mapping, with the HDFS-347 patch it would actually be
possible to achieve memory mapping of the blocks stored by the DN for local
reads. It uses some somewhat esoteric posix features to transfer
already-open file descriptors across the process boundary, at which point
with enough maneuvering we could directly mmap the blocks.

That said, it was experimental work then and is low on the priority scale
for the near future.

The current most likely candidate to give a big boost to random read
performance is HDFS-941 which my coworker is working on. I ran into some
issues last week when load testing a preliminary patch, but we're looking
into it. If it passes testing we may be able to include it in CDH3 (no
promises though, stability come first!).

-Todd

>
>
> 2010/4/26 Todd Lipcon <todd@cloudera.com>
>
> > On Mon, Apr 26, 2010 at 3:36 PM, Geoff Hendrey <ghendrey@decarta.com>
> > wrote:
> >
> > > Let me preface this by saying that you all know much better than I do
> > what
> > > is best. I'm very impressed by what you've done, and so this isn't
> > > criticism. Far from it. It's just curiosity.
> > >
> > > Memory indexes are "decent", because while they are fast, they don't
> > scale.
> > > At some point you run out of RAM. Are you implementing an LRU cache?
> > Since
> > > the table is orders of magnitude larger than the memory available on
> any
> > > region server (even accounting for the fact that a region server needs
> to
> > > cache only its "shard") it's hard to understand how I could support
> 100%
> > > cache hit rate for a TB-sized table and a reasonable number of region
> > > servers.
> > >
> > > When you get a cache miss, and you almost always will when the table is
> > > orders of magnitude larger than the cache, you need to read a whole
> block
> > > out of HDFS.
> > >
> >
> > This is a common misconception about HDFS. There's no need to read an
> > entire
> > HDFS block at a time. Although the blocks may be 64MB+, you can certainly
> > read very small byte ranges, and that's exactly what HBase does.
> >
> > For a more efficient method of accessing local data blocks, I did some
> > initial experimentation in HDFS-347, but the speedup was not an order of
> > magnitude.
> >
> > -Todd
> >
> >
> > >
> > > My thought with memory mapping was, as you noted, *not* to try to map
> > files
> > > that are inside of HDFS but rather to copy as many blocks as possible
> out
> > of
> > > HDFS, onto region server filesystems, and memory map the file on the
> > region
> > > server. TB drives are now common. The virtual memory system of the
> > Operating
> > > System manages paging in and out of "real" memory off disk when you use
> > > memory mapping. My experience with memory mapped ByteBuffer in Java is
> > that
> > > it is very fast and scalable. By fast, I mean I have clocked reads in
> the
> > > microseconds using nanotime. So I was just wondering why you wouldn't
> at
> > > least make a 2nd level cache with memory mapping.
> > >
> > > -geoff
> > >
> > > -----Original Message-----
> > > From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> > > Sent: Monday, April 26, 2010 1:24 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Re: optimizing for random access
> > >
> > > HFile uses in memory indexes to only need 1 seek to access data.  How
> is
> > > this only "decent" ?
> > >
> > > As for memory mapped files, given that HDFS files are not local, we
> can't
> > > mmap() them.  However HBase does block caching in memory to reduce the
> > trips
> > > to HDFS.
> > >
> > > -ryan
> > >
> > >
> > >
> > > On Mon, Apr 26, 2010 at 11:33 AM, Geoff Hendrey <ghendrey@decarta.com>
> > > wrote:
> > > > Hi,
> > > >
> > > > Any pointers on how to optimize hbase for random access? My
> > > > understanding is that HFile is decent at random access. Why doesn't
> it
> > > > use memory mapped I/O? (my reading on it indicated it uses "something
> > > > like NIO").  I'd like my entire table to be distributed across region
> > > > servers, so that random reads are quickly served by a region server
> > > > without having to transfer a block from HDFS. Is this the right
> > > > approach? I would have thought that some sort of memory-mapped region
> > > > file would be perfect for this. Anyway, just looking to understand
> the
> > > > best practice(s).
> > > >
> > > >
> > > > -geoff
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message