hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Size of KeyValue
Date Wed, 30 Nov 2011 07:25:35 GMT
On Mon, Nov 28, 2011 at 8:05 PM, Lars <lhofhansl@yahoo.com> wrote:
> Hmm, interesting. It's used (among others) in server side scanners to hold the current
row. Could just keep a reference to the KeyValue around instead. Need to make we don't hold
on to the current blocks buffer forever, though.

Yep - another option is for the scanner to have its own byte[] where
it copies the current row key into - since it's already doing
comparisons with it for every advance of the scanner, it should be in
L2 cache if not L1, and the copy would be minimally expensive.


> Todd Lipcon <todd@cloudera.com> schrieb:
>>If I recall correctly, we put this in more for the benefit of the
>>client side, with the assumption that the server side would never call
>>this API. Then, we ended up writing some bad code somewhere in the
>>server which calls this function.
>>On Mon, Nov 28, 2011 at 5:42 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
>>> Did some (unscientific) tests...
>>> The test scenario is 1m (wide) rows (50m cells) and then scanning along all cells.
>>> The difference in runtime is within the noise. When I measure GC stats with jstat
I see a ~3% reduction is young collections, and a ~10% reduction in overall GC time.
>>> At the same time I set a counting breakpoint on KeyValue.getRow that fires when
rowcache is found not-null. I found this triggered about every 16 key values, which suggests
>>> the optimization saves a lot of copying of the row key.
>>> It is not entirely clear under what circumstances the rowCache would a be win,
outweighing the extra static memory by every KV.
>>> So it looks like it is not worth making the change, although I suppose anything
reducing GC pressure is a win.
>>> -- Lars
>>> ----- Original Message -----
>>> From: lars hofhansl <lhofhansl@yahoo.com>
>>> To: Stack <stack@duboce.net>; "dev@hbase.apache.org" <dev@hbase.apache.org>
>>> Cc:
>>> Sent: Thursday, November 24, 2011 11:57 AM
>>> Subject: Re: Size of KeyValue
>>> Hmm... Might be hard to prove whether removing that would be a net win or net
loss in the current code base.
>>> I'll do some tests and report back.
>>> ________________________________
>>> From: Stack <stack@duboce.net>
>>> To: dev@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
>>> Sent: Wednesday, November 23, 2011 8:41 PM
>>> Subject: Re: Size of KeyValue
>>> On Wed, Nov 23, 2011 at 3:40 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
>>>> Looking at KeyValue I see three variable purely used for caching:
>>>> timestampCache(long), rowCache(byte[]), and keyLength(int).
>>>> From a quick glance over the code I do not see many spots where we repeatedly
get the TS, rowKey, of keyLength from the same KV.
>>>> Together these consume 24 bytes (almost 1/2 of KeyValue's constant memory
overhead) on every key value created, and we create
>>>> a *lot* KVs (real and "fake" ones) during scanning and seeking.
>>>> Were these added to address specific performance concerns? If not, we might
consider removing these.
>>> IIRC, I added them after watching stuff in a profiler (a long time
>>> ago).  Things change.  Thats a lot of static mem to give up.
>>> St.Ack
>>Todd Lipcon
>>Software Engineer, Cloudera

Todd Lipcon
Software Engineer, Cloudera

View raw message