hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-14509) Configurable sparse indexes?
Date Tue, 29 Sep 2015 07:15:04 GMT
Lars Hofhansl created HBASE-14509:

             Summary: Configurable sparse indexes?
                 Key: HBASE-14509
                 URL: https://issues.apache.org/jira/browse/HBASE-14509
             Project: HBase
          Issue Type: Brainstorming
            Reporter: Lars Hofhansl

This idea just popped up today and I wanted to record it for discussion:
What if we kept sparse column indexes per region or HFile or per configurable range?

I.e. For any given CQ we record the lowest and highest value for a particular range (HFile,
Region, or a custom range like the Phoenix guide post).

By tweaking the size of these ranges we can control the size of the index, vs its selectivity.

For example if we kept it by HFile we can almost instantly decide whether we need scan a particular
HFile at all to find a particular value in a Cell.

We can also collect min/max values for each n MB of data, for example when we can the region
the first time. Assuming ranges are large enough we can always keep the index in memory together
with the region.

Kind of a sparse local index. Might much easier than the buddy region stuff we've been discussing.

This message was sent by Atlassian JIRA

View raw message