hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-3745) Add the ability to restrict major-compactible files by timestamp
Date Wed, 06 Apr 2011 20:58:06 GMT
Add the ability to restrict major-compactible files by timestamp

                 Key: HBASE-3745
                 URL: https://issues.apache.org/jira/browse/HBASE-3745
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.92.0
            Reporter: Todd Lipcon

In some applications, a common access pattern is to frequently scan tables with a time range
predicate restricted to a fairly recent time window. For example, you may want to do an incremental
aggregation or indexing step only on rows that have changed in the last hour. We do this efficiently
by tracking min and max timestamp on an HFile level, so that old HFiles don't have to be read.

After a major compaction, however, the entire dataset will need to be read, which can hurt
performance of this access pattern.

We should add a column family attribute that can specify a policy like: When major compacting,
never include an HFile that contains data with a timestamp in the last 4 hours. This, recently
flushed HFiles will always be uncompacted and provide the good scan performance required for
these applications.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message