hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-12311) Version stats in HFiles?
Date Sat, 28 Feb 2015 06:01:06 GMT

     [ https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Lars Hofhansl updated HBASE-12311:
    Attachment: 12311-indexed-0.98.txt

Here's a patch that illustrates the idea for 0.98. In store scanner when the SQM indicated
we should seek, we check the nextIndexedKey (if available) and we would seek before that,
we simply SKIP and let the SQM try again.

The only annoying part is that we only an indexed *key* (i.e. row, family, column), which
we are trying to get rid of. HFileReaderV2.AbstractScannerV2.reseekTo performs the same check
to decide whether to seek or to retry on the same block, this just pulls the check up. We
can probably remove that optimization from the AbstractScannerV2 now (and save a few more

> Version stats in HFiles?
> ------------------------
>                 Key: HBASE-12311
>                 URL: https://issues.apache.org/jira/browse/HBASE-12311
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>         Attachments: 12311-indexed-0.98.txt, 12311-v2.txt, 12311-v3.txt, 12311.txt, CellStatTracker.java
> In HBASE-9778 I basically punted the decision on whether doing repeated scanner.next()
called instead of the issueing (re)seeks to the user.
> I think we can do better.
> One way do that is maintain simple stats of what the maximum number of versions we've
seen for any row/col combination and store these in the HFile's metadata (just like the timerange,
oldest Put, etc).
> Then we estimate fairly accurately whether we have to expect lots of versions (i.e. seek
between columns is better) or not (in which case we'd issue repeated next()'s).

This message was sent by Atlassian JIRA

View raw message