hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4717) More efficient age-off of old data during major compaction
Date Tue, 01 Nov 2011 22:19:32 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141681#comment-13141681

Todd Lipcon commented on HBASE-4717:

bq. It would probably be simple to add a check during compaction time of the time range of
each file and if the max is expired, just to wipe out that file.

That's one optimization, but only saves on the read of the now-expired file. We still have
to read/rewrite all of the rest of the data periodically to do the age-off.

The new idea above is to introduce something more like a "filtration" than a "compaction"
-- you would only rewrite files that have a significant amount of data to be aged.
> More efficient age-off of old data during major compaction
> ----------------------------------------------------------
>                 Key: HBASE-4717
>                 URL: https://issues.apache.org/jira/browse/HBASE-4717
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Todd Lipcon
> Many applications need to implement efficient age-off of old data. We currently only
perform age-off during major compaction by scanning through all of the KVs. Instead, we could
implement the following:
> - Set hbase.hstore.compaction.max.size reasonably small. Thus, older store files contain
only smaller finite ranges of time.
> - Periodically run an "age-off compaction". This compaction would scan the current list
of storefiles. Any store file that falls entirely out of the TTL time range would be dropped.
Store files completely within the time range would be un-altered. Those crossing the time-range
boundary could either be left alone or compacted using the existing compaction code.
> I don't have a design in mind for how exactly this would be implemented, but hope to
generate some discussion.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message