hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Size of KeyValue
Date Tue, 29 Nov 2011 01:42:44 GMT
Did some (unscientific) tests...
The test scenario is 1m (wide) rows (50m cells) and then scanning along all cells.

The difference in runtime is within the noise. When I measure GC stats with jstat I see a
~3% reduction is young collections, and a ~10% reduction in overall GC time.

At the same time I set a counting breakpoint on KeyValue.getRow that fires when rowcache is
found not-null. I found this triggered about every 16 key values, which suggests
the optimization saves a lot of copying of the row key.

It is not entirely clear under what circumstances the rowCache would a be win, outweighing
the extra static memory by every KV.

So it looks like it is not worth making the change, although I suppose anything reducing GC
pressure is a win.

-- Lars

----- Original Message -----
From: lars hofhansl <lhofhansl@yahoo.com>
To: Stack <stack@duboce.net>; "dev@hbase.apache.org" <dev@hbase.apache.org>
Sent: Thursday, November 24, 2011 11:57 AM
Subject: Re: Size of KeyValue

Hmm... Might be hard to prove whether removing that would be a net win or net loss in the
current code base.
I'll do some tests and report back.

From: Stack <stack@duboce.net>
To: dev@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com> 
Sent: Wednesday, November 23, 2011 8:41 PM
Subject: Re: Size of KeyValue

On Wed, Nov 23, 2011 at 3:40 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
> Looking at KeyValue I see three variable purely used for caching:
> timestampCache(long), rowCache(byte[]), and keyLength(int).
> From a quick glance over the code I do not see many spots where we repeatedly get the
TS, rowKey, of keyLength from the same KV.
> Together these consume 24 bytes (almost 1/2 of KeyValue's constant memory overhead) on
every key value created, and we create
> a *lot* KVs (real and "fake" ones) during scanning and seeking.
> Were these added to address specific performance concerns? If not, we might consider
removing these.

IIRC, I added them after watching stuff in a profiler (a long time
ago).  Things change.  Thats a lot of static mem to give up.


View raw message