hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Corgan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7279) Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher
Date Wed, 05 Dec 2012 07:03:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510322#comment-13510322
] 

Matt Corgan commented on HBASE-7279:
------------------------------------

on a related note - one of the benefits of the mutable Cell implementations is that the first
time a cell gets parsed out of the data block, we can store all the offset/length variables
in nice fast int primitives.  I'm trying to convert the existing SeekerState to this right
now.  

When the cells are travelling through all the scanners/heaps/filters, the methods like getQualifierLength()
will simply return the already-calculated primitive int.  With plain KeyValue as it is now,
each time getQualifierLength() is called you have to do all of the following, and it may get
called many times on the way from disk to client:
{code}
  @Override
  public short getRowLength() {
    return Bytes.toShort(this.bytes, getKeyOffset());
  }
  public int getFamilyOffset(int rlength) {
    return this.offset + ROW_OFFSET + Bytes.SIZEOF_SHORT + rlength + Bytes.SIZEOF_BYTE;
  }
  @Override
  public byte getFamilyLength() {
    return getFamilyLength(getFamilyOffset());
  }
  @Override
  public int getQualifierLength() {
    return getQualifierLength(getRowLength(),getFamilyLength());
  }
  public int getQualifierLength(int rlength, int flength) {
    return getKeyLength() - (int) getKeyDataStructureSize(rlength, flength, 0);
  }
  public static long getKeyDataStructureSize(int rlength, int flength, int qlength) {
    return KeyValue.KEY_INFRASTRUCTURE_SIZE + rlength + flength + qlength;
  }
{code}
                
> Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-7279
>                 URL: https://issues.apache.org/jira/browse/HBASE-7279
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: 7279-0.94.txt, 7279-0.94-v2.txt
>
>
> Did some profiling again.
> I we can gain some performance [1] when passing buffer, rowoffset, and rowlength instead
of making a copy of the row key.
> That way we can also remove the row key caching (and this patch also removes the timestamps
caching). Considering the sheer number in which we create KVs, every byte save is good.
> [1] (15-20% when data is in the block cache we setup a Filter such that only a single
row is returned to the client).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message