hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Rodionov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9751) Excessive readpoints checks in MemStoreScanner and StoreFileScanner
Date Sat, 12 Oct 2013 17:34:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793425#comment-13793425
] 

Vladimir Rodionov commented on HBASE-9751:
------------------------------------------


* HFiles are immutable, but they can contain KVs that a current scanner should not see.
* I have a patch that verifies the readpoint in HFiles only when needed.
* Factoring this out of the memstore turns out to be harder. Memstore.next() can be called
from any thread at any time), so I don't see how that can be easily fixed.

1. Lars, the first I do just do not understand. Yes, it is possible to have long lived scanners
which is older than particular HFile but in this case I think, this file should not get into
the scanner at all?

2. There is another expensive call (in ScannerV2 and similar) which can be optimized  in case
if memstoreTS is always 0.
I commented this section out and set explicit memstoreTS = 0. This is opportunity for another
optimization.
This one:
{code}
    private final void readKeyValueLen() {
      blockBuffer.mark();
      currKeyLen = blockBuffer.getInt();
      currValueLen = blockBuffer.getInt();
      blockBuffer.reset();
//      if (this.reader.shouldIncludeMemstoreTS()) {
//        try {
//          int memstoreTSOffset = blockBuffer.arrayOffset()
//              + blockBuffer.position() + KEY_VALUE_LEN_SIZE + currKeyLen
//              + currValueLen;
//          currMemstoreTS = Bytes.readVLong(blockBuffer.array(),
//              memstoreTSOffset);
//          currMemstoreTSLen = WritableUtils.getVIntSize(currMemstoreTS);
//         
//        } catch (Exception e) {
//          throw new RuntimeException("Error reading memstore timestamp", e);
//        }
//      }
      currMemstoreTS = 0;
      currMemstoreTSLen = 1;
      
      if (currKeyLen < 0 || currValueLen < 0
          || currKeyLen > blockBuffer.limit()
          || currValueLen > blockBuffer.limit()) {
        throw new IllegalStateException("Invalid currKeyLen " + currKeyLen
            + " or currValueLen " + currValueLen + ". Block offset: "
            + block.getOffset() + ", block length: " + blockBuffer.limit()
            + ", position: " + blockBuffer.position() + " (without header).");
      }
    }

{code}

3. I am going to test MemStoreScanner with readpoints on/off today. I am sure they expensive.
I think we need instantiate MemStoreScanner with a readpoint already, as since they are not
shared among threads (and can't be). Just a single arg to ctor: long readpoint. 



> Excessive  readpoints checks in MemStoreScanner and StoreFileScanner
> --------------------------------------------------------------------
>
>                 Key: HBASE-9751
>                 URL: https://issues.apache.org/jira/browse/HBASE-9751
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0, 0.94.12, 0.96.0
>            Reporter: Vladimir Rodionov
>            Assignee: Lars Hofhansl
>         Attachments: 9751-0.94.txt, 9751-trunk.txt
>
>
> It seems that usage of skipKVsNewerThanReadpoint in StoreFileScanner can be greatly reduced
or even eliminated all together (HFiles are immutable and no new KVs can be inserted after
scanner instance is created). The same is true for MemStoreScanner which checks readpoint
on every next() and seek(). Each readpoint check is ThreadLocal.get() and it is quite expensive.
 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message