hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Size of KeyValue
Date Tue, 29 Nov 2011 01:59:04 GMT
If I recall correctly, we put this in more for the benefit of the
client side, with the assumption that the server side would never call
this API. Then, we ended up writing some bad code somewhere in the
server which calls this function.


On Mon, Nov 28, 2011 at 5:42 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
> Did some (unscientific) tests...
> The test scenario is 1m (wide) rows (50m cells) and then scanning along all cells.
> The difference in runtime is within the noise. When I measure GC stats with jstat I see
a ~3% reduction is young collections, and a ~10% reduction in overall GC time.
> At the same time I set a counting breakpoint on KeyValue.getRow that fires when rowcache
is found not-null. I found this triggered about every 16 key values, which suggests
> the optimization saves a lot of copying of the row key.
> It is not entirely clear under what circumstances the rowCache would a be win, outweighing
the extra static memory by every KV.
> So it looks like it is not worth making the change, although I suppose anything reducing
GC pressure is a win.
> -- Lars
> ----- Original Message -----
> From: lars hofhansl <lhofhansl@yahoo.com>
> To: Stack <stack@duboce.net>; "dev@hbase.apache.org" <dev@hbase.apache.org>
> Cc:
> Sent: Thursday, November 24, 2011 11:57 AM
> Subject: Re: Size of KeyValue
> Hmm... Might be hard to prove whether removing that would be a net win or net loss in
the current code base.
> I'll do some tests and report back.
> ________________________________
> From: Stack <stack@duboce.net>
> To: dev@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
> Sent: Wednesday, November 23, 2011 8:41 PM
> Subject: Re: Size of KeyValue
> On Wed, Nov 23, 2011 at 3:40 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
>> Looking at KeyValue I see three variable purely used for caching:
>> timestampCache(long), rowCache(byte[]), and keyLength(int).
>> From a quick glance over the code I do not see many spots where we repeatedly get
the TS, rowKey, of keyLength from the same KV.
>> Together these consume 24 bytes (almost 1/2 of KeyValue's constant memory overhead)
on every key value created, and we create
>> a *lot* KVs (real and "fake" ones) during scanning and seeking.
>> Were these added to address specific performance concerns? If not, we might consider
removing these.
> IIRC, I added them after watching stuff in a profiler (a long time
> ago).  Things change.  Thats a lot of static mem to give up.
> St.Ack

Todd Lipcon
Software Engineer, Cloudera

View raw message