hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Converting byte[] to ByteBuffer
Date Mon, 11 Jul 2011 20:30:33 GMT
> Further, (I asked this previously), where is the general CPU usage in
> HBase?  Binary search on keys for seeking, skip list reads and writes,
> and [maybe] MapReduce jobs?  

If you are running colocated MapReduce jobs, then it could be the user code of course.

Otherwise it depends on workload.

For our apps I observe the following top line items when profiling:

   - KV comparators: By far the most common operation, searching keys, writing HFiles, etc.

   - MemStore CSLM ops: Especially if upserting

   - Servicing RPCs: Writable marshall/unmarshall, monitors

   - Concurrent GC

It generally looks good but MemStore can be improved, especially for the upsert case.

Reminds me I need to profile the latest. It's been a few weeks.

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


>________________________________
>From: Jason Rutherglen <jason.rutherglen@gmail.com>
>To: dev@hbase.apache.org
>Sent: Sunday, July 10, 2011 3:05 PM
>Subject: Re: Converting byte[] to ByteBuffer
>
>Ted,
>
>Interesting.  I think we need to take a deeper look at why essentially
>turning off the caching of uncompressed blocks doesn't [seem to]
>matter.  My guess is it's cheaper to decompress on the fly than hog
>from the system IO cache with JVM heap usage.
>
>Ie, CPU is cheaper than disk IO.
>
>Further, (I asked this previously), where is the general CPU usage in
>HBase?  Binary search on keys for seeking, skip list reads and writes,
>and [maybe] MapReduce jobs?  The rest should more or less be in the
>noise (or is general Java overhead).
>
>I'd be curious to know the avg CPU consumption of an active HBase system.
>
>On Sat, Jul 9, 2011 at 11:14 PM, Ted Dunning <tdunning@maprtech.com> wrote:
>> No.  The JNI is below the HDFS compatible API.  Thus the changed code is in
>> the hadoop.jar and associated jars and .so's that MapR supplies.
>>
>> The JNI still runs in the HBase memory image, though, so it can make data
>> available faster.
>>
>> The cache involved includes the cache of disk blocks (not HBase memcache
>> blocks) in the JNI and in the filer sub-system.
>>
>> The detailed reasons why more caching in the file system and less in HBase
>> makes the overall system faster are not completely worked out, but the
>> general outlines are pretty clear.  There are likely several factors at work
>> in any case including less GC cost due to smaller memory foot print, caching
>> compressed blocks instead of Java structures and simplification due to a
>> clean memory hand-off with associated strong demarcation of where different
>> memory allocators have jurisdiction.
>>
>> On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen <jason.rutherglen@gmail.com
>>> wrote:
>>
>>> I'm a little confused, I was told none of the HBase code changed with MapR,
>>> if the HBase (not the OS) block cache has a JNI implementation then that
>>> part of the HBase code changed.
>>> On Jul 9, 2011 11:19 AM, "Ted Dunning" <tdunning@maprtech.com> wrote:
>>> > MapR does help with the GC because it *does* have a JNI interface into an
>>> > external block cache.
>>> >
>>> > Typical configurations with MapR trim HBase down to the minimal viable
>>> size
>>> > and increase the file system cache correspondingly.
>>> >
>>> > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen <
>>> jason.rutherglen@gmail.com
>>> >> wrote:
>>> >
>>> >> MapR doesn't help with the GC issues. If MapR had a JNI
>>> >> interface into an external block cache then that'd be a different
>>> >> story. :) And I'm sure it's quite doable.
>>> >>
>>>
>>
>
>
> 


Mime
View raw message