hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars <lhofha...@yahoo.com>
Subject Re: Size of KeyValue
Date Tue, 29 Nov 2011 04:05:35 GMT
Hmm, interesting. It's used (among others) in server side scanners to hold the current row.
Could just keep a reference to the KeyValue around instead. Need to make we don't hold on
to the current blocks buffer forever, though. 

Todd Lipcon <todd@cloudera.com> schrieb:

>If I recall correctly, we put this in more for the benefit of the
>client side, with the assumption that the server side would never call
>this API. Then, we ended up writing some bad code somewhere in the
>server which calls this function.
>On Mon, Nov 28, 2011 at 5:42 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
>> Did some (unscientific) tests...
>> The test scenario is 1m (wide) rows (50m cells) and then scanning along all cells.
>> The difference in runtime is within the noise. When I measure GC stats with jstat
I see a ~3% reduction is young collections, and a ~10% reduction in overall GC time.
>> At the same time I set a counting breakpoint on KeyValue.getRow that fires when rowcache
is found not-null. I found this triggered about every 16 key values, which suggests
>> the optimization saves a lot of copying of the row key.
>> It is not entirely clear under what circumstances the rowCache would a be win, outweighing
the extra static memory by every KV.
>> So it looks like it is not worth making the change, although I suppose anything reducing
GC pressure is a win.
>> -- Lars
>> ----- Original Message -----
>> From: lars hofhansl <lhofhansl@yahoo.com>
>> To: Stack <stack@duboce.net>; "dev@hbase.apache.org" <dev@hbase.apache.org>
>> Cc:
>> Sent: Thursday, November 24, 2011 11:57 AM
>> Subject: Re: Size of KeyValue
>> Hmm... Might be hard to prove whether removing that would be a net win or net loss
in the current code base.
>> I'll do some tests and report back.
>> ________________________________
>> From: Stack <stack@duboce.net>
>> To: dev@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
>> Sent: Wednesday, November 23, 2011 8:41 PM
>> Subject: Re: Size of KeyValue
>> On Wed, Nov 23, 2011 at 3:40 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
>>> Looking at KeyValue I see three variable purely used for caching:
>>> timestampCache(long), rowCache(byte[]), and keyLength(int).
>>> From a quick glance over the code I do not see many spots where we repeatedly
get the TS, rowKey, of keyLength from the same KV.
>>> Together these consume 24 bytes (almost 1/2 of KeyValue's constant memory overhead)
on every key value created, and we create
>>> a *lot* KVs (real and "fake" ones) during scanning and seeking.
>>> Were these added to address specific performance concerns? If not, we might consider
removing these.
>> IIRC, I added them after watching stuff in a profiler (a long time
>> ago).  Things change.  Thats a lot of static mem to give up.
>> St.Ack
>Todd Lipcon
>Software Engineer, Cloudera
View raw message