hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12311) Version stats in HFiles?
Date Thu, 06 Nov 2014 06:03:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199850#comment-14199850

Lars Hofhansl commented on HBASE-12311:

The logic would something like this:
* replace SEEK_COL with SKIP when when the mean size of all columns in <= 1/2 HBlock and
standard deviation is < 1 HBlock and the max size of any column is < 4 HFileBlocks.

same for row:
* replace SEEK_ROW when when the mean size of all rows in <= 1/2 HBlock  and standard deviation
is < 1 HBlocks and the max size of any row is < 4 HBlocks

(I might not do the standard deviation part, not sure it's really needed)

So we'll avoid using SKIPs when a SEEK will with some probably land outside of the current
block. If that is the case we'll use SEEK_COL, SEEK_ROW as before. SEEK_WITH_HINT would always
be executed as SEEK_WITH_HINT.

> Version stats in HFiles?
> ------------------------
>                 Key: HBASE-12311
>                 URL: https://issues.apache.org/jira/browse/HBASE-12311
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>         Attachments: 12311.txt, CellStatTracker.java
> In HBASE-9778 I basically punted the decision on whether doing repeated scanner.next()
called instead of the issueing (re)seeks to the user.
> I think we can do better.
> One way do that is maintain simple stats of what the maximum number of versions we've
seen for any row/col combination and store these in the HFile's metadata (just like the timerange,
oldest Put, etc).
> Then we estimate fairly accurately whether we have to expect lots of versions (i.e. seek
between columns is better) or not (in which case we'd issue repeated next()'s).

This message was sent by Atlassian JIRA

View raw message