hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Spiegelberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6371) Level based compaction
Date Tue, 21 Aug 2012 20:48:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439025#comment-13439025
] 

Nicolas Spiegelberg commented on HBASE-6371:
--------------------------------------------

@Lars: I think we want to put level-based & tiered compactions in the core instead of
as coprocessors because these are generic strategies versus app-specific logic.

@Akashnil: the algorithm you describe is technically referred to as a "tiered compaction".
 DataStax has a nice writeup on tiered compactions versus level-based: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
                
> Level based compaction
> ----------------------
>
>                 Key: HBASE-6371
>                 URL: https://issues.apache.org/jira/browse/HBASE-6371
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Akashnil
>            Assignee: Akashnil
>
> Currently, the compaction selection is not very flexible and is not sensitive to the
hotness of the data. Very old data is likely to be accessed less, and very recent data is
likely to be in the block cache. Both of these considerations make it inefficient to compact
these files as aggressively as other files. In some use-cases, the access-pattern is particularly
obvious even though there is no way to control the compaction algorithm in those cases.
> In the new compaction selection algorithm, we plan to divide the candidate files into
different levels according to oldness of the data that is present in those files. For each
level, parameters like compaction ratio, minimum number of store-files in each compaction
may be different. Number of levels, time-ranges, and parameters for each level will be configurable
online on a per-column family basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message