Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 5 Dec 2012 06:08:59 +0000 (UTC)
From: "Lars Hofhansl (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <1199543778.62286.1354687739849.JavaMail.jiratomcat@arcas>
In-Reply-To: <98694101.61923.1354678738229.JavaMail.jiratomcat@arcas>
Subject: [jira] [Comment Edited] (HBASE-7279) Avoid copying the rowkey in
 RegionScanner, StoreScanner, and ScanQueryMatcher
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510295#comment-13510295 ] 

Lars Hofhansl edited comment on HBASE-7279 at 12/5/12 6:07 AM:
---------------------------------------------------------------

[~saint.ack@gmail.com] You mean leave out the timestamp cache, or leave out the change that removes the timestamp cache? :)  I can go either way.

However, 8 bytes is not insignificant (the rest of a KV is just 16 + 24 + 4 + 4 + 4 + 8 = 52). (makes me want to remove the keyLength cache as well for another 4 bytes)

At Salesforce we're doing some scans over close to 1bn rows/kvs (most of which won't be shipped to the client).
The issue with the timestamp cache is that it will use 8 bytes, whether we cache anything or not. Over the 1bn KVs we'll produce 8GB of garbage just for this cache. 

I would like to put this into 0.94 as well.

                
      was (Author: lhofhansl):
    You leave out the timestamp cache, or leave out the change that removes the timestamp cache? :)  I can go either way.

However, 8 bytes is not insignificant (the rest of a KV is just 16 + 24 + 4 + 4 + 4 + 8 = 52). (makes me want to remove the keyLength cache as well for another 4 bytes)

At Salesforce we're doing some scans over close to 1bn rows/kvs (most of which won't be shipped to the client).
The issue with the timestamp cache is that it will use 8 bytes, whether we cache anything or not. Over the 1bn KVs we'll produce 8GB of garbage just for this cache. 

I would like to put this into 0.94 as well.

                  
> Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-7279
>                 URL: https://issues.apache.org/jira/browse/HBASE-7279
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: 7279-0.94.txt
>
>
> Did some profiling again.
> I we can gain some performance [1] when passing buffer, rowoffset, and rowlength instead of making a copy of the row key.
> That way we can also remove the row key caching (and this patch also removes the timestamps caching). Considering the sheer number in which we create KVs, every byte save is good.
> [1] (15-20% when data is in the block cache we setup a Filter such that only a single row is returned to the client).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira