Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CDF96E1FA for ; Wed, 5 Dec 2012 06:09:00 +0000 (UTC) Received: (qmail 86201 invoked by uid 500); 5 Dec 2012 06:09:00 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 86145 invoked by uid 500); 5 Dec 2012 06:09:00 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 85859 invoked by uid 99); 5 Dec 2012 06:08:59 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Dec 2012 06:08:59 +0000 Date: Wed, 5 Dec 2012 06:08:59 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: <1199543778.62286.1354687739849.JavaMail.jiratomcat@arcas> In-Reply-To: <98694101.61923.1354678738229.JavaMail.jiratomcat@arcas> Subject: [jira] [Comment Edited] (HBASE-7279) Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510295#comment-13510295 ] Lars Hofhansl edited comment on HBASE-7279 at 12/5/12 6:07 AM: --------------------------------------------------------------- [~saint.ack@gmail.com] You mean leave out the timestamp cache, or leave out the change that removes the timestamp cache? :) I can go either way. However, 8 bytes is not insignificant (the rest of a KV is just 16 + 24 + 4 + 4 + 4 + 8 = 52). (makes me want to remove the keyLength cache as well for another 4 bytes) At Salesforce we're doing some scans over close to 1bn rows/kvs (most of which won't be shipped to the client). The issue with the timestamp cache is that it will use 8 bytes, whether we cache anything or not. Over the 1bn KVs we'll produce 8GB of garbage just for this cache. I would like to put this into 0.94 as well. was (Author: lhofhansl): You leave out the timestamp cache, or leave out the change that removes the timestamp cache? :) I can go either way. However, 8 bytes is not insignificant (the rest of a KV is just 16 + 24 + 4 + 4 + 4 + 8 = 52). (makes me want to remove the keyLength cache as well for another 4 bytes) At Salesforce we're doing some scans over close to 1bn rows/kvs (most of which won't be shipped to the client). The issue with the timestamp cache is that it will use 8 bytes, whether we cache anything or not. Over the 1bn KVs we'll produce 8GB of garbage just for this cache. I would like to put this into 0.94 as well. > Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher > ----------------------------------------------------------------------------- > > Key: HBASE-7279 > URL: https://issues.apache.org/jira/browse/HBASE-7279 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Fix For: 0.96.0, 0.94.4 > > Attachments: 7279-0.94.txt > > > Did some profiling again. > I we can gain some performance [1] when passing buffer, rowoffset, and rowlength instead of making a copy of the row key. > That way we can also remove the row key caching (and this patch also removes the timestamps caching). Considering the sheer number in which we create KVs, every byte save is good. > [1] (15-20% when data is in the block cache we setup a Filter such that only a single row is returned to the client). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira