Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1FC541863E for ; Fri, 29 May 2015 11:14:18 +0000 (UTC) Received: (qmail 72077 invoked by uid 500); 29 May 2015 11:14:18 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 72029 invoked by uid 500); 29 May 2015 11:14:17 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 72017 invoked by uid 99); 29 May 2015 11:14:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 May 2015 11:14:17 +0000 Date: Fri, 29 May 2015 11:14:17 +0000 (UTC) From: "Anoop Sam John (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13448) New Cell implementation with cached component offsets/lengths MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564589#comment-14564589 ] Anoop Sam John commented on HBASE-13448: ---------------------------------------- [~larsh] can you share test pls? After going through 0.98 code In ur test the data is major compacted? If so only one file and there won't be comparisons in the KVHeap under the StoreScanner. So the call to getXXXOffset/Length happens in StoreScanner and then in SQM. But seeing SQM, we are finding the length/offset using parsing on KeyValue#getBuffer() returned byte[]. Then the KVs are skipped using the ValueFilter. So in total the actual calls to getXXXLength()/Offset happens mostly one time only. Can be a reason why no perf gain we get. Still 8.4 sec to 8.5 secs is like a 1% degrade and am not sure why so. GC is creating overhead? Or this is just a noise? Said so, I feel this is good to go in for trunk considering the #calls to these offset/lengths. SQM layer and all it has increased only. The calls will be more when we have more store files in a store and/or more than one store etc. As my Table in above comments it shows the #calls to each of these getters in case of single CF and single storefile in that. Still the calls are more and when the stores and /or store files are more it will become more only. BTW I have also noticed one more issue with 0.98. Here we have HFile V2 as default and that is not having Tags. We have done optimization so that when the tags length is 0 we will create a NoTagsKeyValue which avoids getTagsLength() overhead. In HfileReaderV3 the impl is correct. But HFileV2 (which is the default in 0.98) returns KeyValue. Here we can always return NoTagsKeyValue. I can raise a Jira and give a fix. > New Cell implementation with cached component offsets/lengths > ------------------------------------------------------------- > > Key: HBASE-13448 > URL: https://issues.apache.org/jira/browse/HBASE-13448 > Project: HBase > Issue Type: Sub-task > Components: Scanners > Reporter: Anoop Sam John > Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: 13448-0.98.txt, HBASE-13448.patch, HBASE-13448_V2.patch, HBASE-13448_V3.patch, gc.png, hits.png > > > This can be extension to KeyValue and can be instantiated and used in read path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)