hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@fb.com>
Subject RE: Converting byte[] to ByteBuffer
Date Sun, 10 Jul 2011 07:59:53 GMT
There are plenty of arguments in both directions for caching above the DB, in the DB, or under
the DB/in the FS.  I have significant interest in supporting large heaps and reducing GC issues
within the HBase RegionServer and I am already running with local fs reads.  I don't think
a faster dfs makes HBase caching irrelevant or the conversation a non-starter.

To get back to the original question, I ended up trying this once.  I wrote a rough implementation
of a slab allocator a few months ago to dive in and see what it would take.  The big challenge
is KeyValue and its various comparators.  The ByteBuffer API can be maddening at times but
it can be done.  I ended up somewhere slightly more generic, where KeyValue was taking a ByteBlock
which contained ref counting and a reference to the allocator it came from, in addition to
a ByteBuffer.

The easy way to rely on DirectByteBuffers and the like would be to make a copy on read into
a normal byte[], and then no need to worry about ref counting and revamping KV.  Of course,
at the cost of short-term allocations.  In my experience, you can tune the GC around this
and the cost really becomes CPU.

I'm in the process of re-implementing some of this stuff on top of the HFile v2 that is coming
soon.  Once that goes in, this gets much easier at the HFile and block cache level (a new
wrapper around ByteBuffer called HFileBlock which can be used for refc and such, instead of
introducing huge changes for caching stuff)


> -----Original Message-----
> From: Ted Dunning [mailto:tdunning@maprtech.com]
> Sent: Saturday, July 09, 2011 11:14 PM
> To: dev@hbase.apache.org
> Subject: Re: Converting byte[] to ByteBuffer
> No.  The JNI is below the HDFS compatible API.  Thus the changed code is in
> the hadoop.jar and associated jars and .so's that MapR supplies.
> The JNI still runs in the HBase memory image, though, so it can make data
> available faster.
> The cache involved includes the cache of disk blocks (not HBase memcache
> blocks) in the JNI and in the filer sub-system.
> The detailed reasons why more caching in the file system and less in HBase
> makes the overall system faster are not completely worked out, but the
> general outlines are pretty clear.  There are likely several factors at work in
> any case including less GC cost due to smaller memory foot print, caching
> compressed blocks instead of Java structures and simplification due to a
> clean memory hand-off with associated strong demarcation of where
> different memory allocators have jurisdiction.
> On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen
> <jason.rutherglen@gmail.com
> > wrote:
> > I'm a little confused, I was told none of the HBase code changed with
> > MapR, if the HBase (not the OS) block cache has a JNI implementation
> > then that part of the HBase code changed.
> > On Jul 9, 2011 11:19 AM, "Ted Dunning" <tdunning@maprtech.com> wrote:
> > > MapR does help with the GC because it *does* have a JNI interface
> > > into an external block cache.
> > >
> > > Typical configurations with MapR trim HBase down to the minimal
> > > viable
> > size
> > > and increase the file system cache correspondingly.
> > >
> > > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen <
> > jason.rutherglen@gmail.com
> > >> wrote:
> > >
> > >> MapR doesn't help with the GC issues. If MapR had a JNI interface
> > >> into an external block cache then that'd be a different story. :)
> > >> And I'm sure it's quite doable.
> > >>
> >

View raw message