hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Converting byte[] to ByteBuffer
Date Tue, 12 Jul 2011 06:10:09 GMT
>    - MemStore CSLM ops: Especially if upserting

I quick thought on that one, perhaps it'd be helped by limiting the
aggregate size of the CSLM, eg, skip lists at too large a size start
to degrade in performance.  Something like multiple CSLMs could work?
Grow a CSLM to a given size, then start a new one.

On Mon, Jul 11, 2011 at 1:30 PM, Andrew Purtell <apurtell@apache.org> wrote:
>> Further, (I asked this previously), where is the general CPU usage in
>> HBase?  Binary search on keys for seeking, skip list reads and writes,
>> and [maybe] MapReduce jobs?
> If you are running colocated MapReduce jobs, then it could be the user code of course.
> Otherwise it depends on workload.
> For our apps I observe the following top line items when profiling:
>    - KV comparators: By far the most common operation, searching keys, writing HFiles,
>    - MemStore CSLM ops: Especially if upserting
>    - Servicing RPCs: Writable marshall/unmarshall, monitors
>    - Concurrent GC
> It generally looks good but MemStore can be improved, especially for the upsert case.
> Reminds me I need to profile the latest. It's been a few weeks.
> Best regards,
>    - Andy
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>>From: Jason Rutherglen <jason.rutherglen@gmail.com>
>>To: dev@hbase.apache.org
>>Sent: Sunday, July 10, 2011 3:05 PM
>>Subject: Re: Converting byte[] to ByteBuffer
>>Interesting.  I think we need to take a deeper look at why essentially
>>turning off the caching of uncompressed blocks doesn't [seem to]
>>matter.  My guess is it's cheaper to decompress on the fly than hog
>>from the system IO cache with JVM heap usage.
>>Ie, CPU is cheaper than disk IO.
>>Further, (I asked this previously), where is the general CPU usage in
>>HBase?  Binary search on keys for seeking, skip list reads and writes,
>>and [maybe] MapReduce jobs?  The rest should more or less be in the
>>noise (or is general Java overhead).
>>I'd be curious to know the avg CPU consumption of an active HBase system.
>>On Sat, Jul 9, 2011 at 11:14 PM, Ted Dunning <tdunning@maprtech.com> wrote:
>>> No.  The JNI is below the HDFS compatible API.  Thus the changed code is in
>>> the hadoop.jar and associated jars and .so's that MapR supplies.
>>> The JNI still runs in the HBase memory image, though, so it can make data
>>> available faster.
>>> The cache involved includes the cache of disk blocks (not HBase memcache
>>> blocks) in the JNI and in the filer sub-system.
>>> The detailed reasons why more caching in the file system and less in HBase
>>> makes the overall system faster are not completely worked out, but the
>>> general outlines are pretty clear.  There are likely several factors at work
>>> in any case including less GC cost due to smaller memory foot print, caching
>>> compressed blocks instead of Java structures and simplification due to a
>>> clean memory hand-off with associated strong demarcation of where different
>>> memory allocators have jurisdiction.
>>> On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen <jason.rutherglen@gmail.com
>>>> wrote:
>>>> I'm a little confused, I was told none of the HBase code changed with MapR,
>>>> if the HBase (not the OS) block cache has a JNI implementation then that
>>>> part of the HBase code changed.
>>>> On Jul 9, 2011 11:19 AM, "Ted Dunning" <tdunning@maprtech.com> wrote:
>>>> > MapR does help with the GC because it *does* have a JNI interface into
>>>> > external block cache.
>>>> >
>>>> > Typical configurations with MapR trim HBase down to the minimal viable
>>>> size
>>>> > and increase the file system cache correspondingly.
>>>> >
>>>> > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen <
>>>> jason.rutherglen@gmail.com
>>>> >> wrote:
>>>> >
>>>> >> MapR doesn't help with the GC issues. If MapR had a JNI
>>>> >> interface into an external block cache then that'd be a different
>>>> >> story. :) And I'm sure it's quite doable.
>>>> >>

View raw message