hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13448) New Cell implementation with cached component offsets/lengths
Date Sun, 31 May 2015 07:23:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566404#comment-14566404

Anoop Sam John commented on HBASE-13448:

@larsh thanks for the comments

I was trying to explain why we won't see any improve as such in the test and especially  in
0.98. Sorry if I was not clearly saying.
Test have 1 CF and single file in that. Under StoreScanner KVHeap, we have only single file
always and there is no comparison happening and no calls to getXXXOffset/Length there.  There
is get calls in StoreScanner (max 2 times) and then in SQM also we need component offset/length.
 But in SQM we dont do get calls on KeyValue to get offset/length.  Instead we calculate there
on parsing KV buffer on our own. (See code below). Then SQM is skipping these cells and so
no further get calls on the cells.  So in effect there is 2 times get call on rowLength and
just one time on others.  This makes it clear why no adv.
In a real case where Cells are not skipped (and in trunk especially) there are many times
call happen and mainly on rowLength.  When ExplicitColTracker in use, there are calls to qualifier
offset/length also many times.  For other component length/offset, the keyLength is parsed
frequently.  If u see table in above comments you can see how many times each call happen
on a single Cell. Those numbers are when cells are written back to client side so comes in
all layes.  But in that test also I had only 1 CF and one HFile.  So when this is also getting
more, there will be comparison op happening in 2 KVHeaps and so the calls will be more. (We
no longer pass the byte[], offset, length into Comparators but instead pass Cell alone)

So in case of trunk there will be adv we would see..  If you can give us your test, I will
run it on trunk.

byte [] bytes = kv.getBuffer();
    int offset = kv.getOffset();

    int keyLength = Bytes.toInt(bytes, offset, Bytes.SIZEOF_INT);
    offset += KeyValue.ROW_OFFSET;

    int initialOffset = offset;

    short rowLength = Bytes.toShort(bytes, offset, Bytes.SIZEOF_SHORT);
    offset += Bytes.SIZEOF_SHORT;

    int ret = this.rowComparator.compareRows(row, this.rowOffset, this.rowLength,
        bytes, offset, rowLength);

//Passing rowLength
    offset += rowLength;

    //Skipping family
    byte familyLength = bytes [offset];
    offset += familyLength + 1;

    int qualLength = keyLength -
      (offset - initialOffset) - KeyValue.TIMESTAMP_TYPE_SIZE;

    long timestamp = Bytes.toLong(bytes, initialOffset + keyLength - KeyValue.TIMESTAMP_TYPE_SIZE);
byte type = bytes[initialOffset + keyLength - 1];
MatchCode colChecker = columns.checkColumn(bytes, offset, qualLength, type);
    if (colChecker == MatchCode.INCLUDE) {
      ReturnCode filterResponse = ReturnCode.SKIP;
      // STEP 2: Yes, the column is part of the requested columns. Check if filter is present
      if (filter != null) {
        // STEP 3: Filter the key value and return if it filters out
        filterResponse = filter.filterKeyValue(kv);


> New Cell implementation with cached component offsets/lengths
> -------------------------------------------------------------
>                 Key: HBASE-13448
>                 URL: https://issues.apache.org/jira/browse/HBASE-13448
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Scanners
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>         Attachments: 13448-0.98.txt, HBASE-13448.patch, HBASE-13448_V2.patch, HBASE-13448_V3.patch,
gc.png, hits.png
> This can be extension to KeyValue and can be instantiated and used in read path.

This message was sent by Atlassian JIRA

View raw message