hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Leach (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14509) Configurable sparse indexes?
Date Tue, 06 Oct 2015 02:59:26 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944433#comment-14944433

John Leach commented on HBASE-14509:

buddy index?  I must have been sick that day in school.

Hmm, how about Histograms, Frequent Items, and cardinality?  They sure help an optimizer know
which end is up.  

> Configurable sparse indexes?
> ----------------------------
>                 Key: HBASE-14509
>                 URL: https://issues.apache.org/jira/browse/HBASE-14509
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
> This idea just popped up today and I wanted to record it for discussion:
> What if we kept sparse column indexes per region or HFile or per configurable range?
> I.e. For any given CQ we record the lowest and highest value for a particular range (HFile,
Region, or a custom range like the Phoenix guide post).
> By tweaking the size of these ranges we can control the size of the index, vs its selectivity.
> For example if we kept it by HFile we can almost instantly decide whether we need scan
a particular HFile at all to find a particular value in a Cell.
> We can also collect min/max values for each n MB of data, for example when we can the
region the first time. Assuming ranges are large enough we can always keep the index in memory
together with the region.
> Kind of a sparse local index. Might much easier than the buddy region stuff we've been

This message was sent by Atlassian JIRA

View raw message