hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13448) New Cell implementation with cached component offsets/lengths
Date Fri, 29 May 2015 11:14:17 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564589#comment-14564589

Anoop Sam John commented on HBASE-13448:

[~larsh] can you share test pls?

After going through 0.98 code 
In ur test the data is major compacted? If so only one file and there won't be comparisons
in the KVHeap under the StoreScanner.  So the call to getXXXOffset/Length happens in StoreScanner
and then in SQM. But seeing SQM, we are finding the length/offset using parsing on KeyValue#getBuffer()
returned byte[].  Then the KVs are skipped using the ValueFilter.  So in total the actual
calls to getXXXLength()/Offset happens mostly one time only. Can be a reason why no perf gain
we get.  Still 8.4 sec to 8.5 secs is like a 1% degrade and am not sure why so. GC is creating
overhead? Or this is just a noise?

Said so, I feel this is good to go in for trunk considering the #calls to these offset/lengths.
SQM layer and all it has increased only.   The calls will be more when we have more store
files in a store and/or more than one store etc.

As my Table in above comments it shows the #calls to each of these getters in case of single
CF and single storefile in that. Still the calls are more and when the stores and /or store
files are more it will become more only.

BTW I have also noticed one more issue with 0.98.  Here we have HFile V2 as default and that
is not having Tags.  We have done optimization so that when the tags length is 0 we will create
a NoTagsKeyValue which avoids getTagsLength() overhead. In HfileReaderV3 the impl is correct.
But HFileV2 (which is the default in 0.98) returns KeyValue. Here we can always return NoTagsKeyValue.
I can raise a Jira and give a fix.

> New Cell implementation with cached component offsets/lengths
> -------------------------------------------------------------
>                 Key: HBASE-13448
>                 URL: https://issues.apache.org/jira/browse/HBASE-13448
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Scanners
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>         Attachments: 13448-0.98.txt, HBASE-13448.patch, HBASE-13448_V2.patch, HBASE-13448_V3.patch,
gc.png, hits.png
> This can be extension to KeyValue and can be instantiated and used in read path.

This message was sent by Atlassian JIRA

View raw message