hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Rodionov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12324) Improve compaction speed and process for immutable short lived datasets
Date Fri, 24 Oct 2014 17:52:36 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183131#comment-14183131

Vladimir Rodionov commented on HBASE-12324:


You can effectively disable compaction by setting the following config:
    conf.setLong("hbase.hregion.max.filesize", Long.MAX_VALUE);
    conf.setLong("hbase.hregion.memstore.flush.size", FLUSH_SIZE);
    conf.setInt("hbase.hstore.compactionThreshold", Integer.MAX_VALUE);
    conf.setInt("hbase.hstore.blockingStoreFiles", Integer.MAX_VALUE); 
    conf.setInt("hbase.hstore.compaction.min", Integer.MAX_VALUE);
    conf.setInt("hbase.hstore.compaction.max", Integer.MAX_VALUE);

* If you do not need compaction , you can have only few (even one) regions per server
* Make sure pre-split your table
* Run periodically utility which purge/archive the oldest HFiles
* FLUSH_SIZE should be large enough but not that extreme. Because you can afford hosting very
few regions per RS , your flush size can be quite large.


> Improve compaction speed and process for immutable short lived datasets
> -----------------------------------------------------------------------
>                 Key: HBASE-12324
>                 URL: https://issues.apache.org/jira/browse/HBASE-12324
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>    Affects Versions: 0.98.0, 0.96.0
>            Reporter: Sheetal Dolas
> We have seen multiple cases where HBase is used to store immutable data and the data
lives for short period of time (few days)
> On very high volume systems, major compactions become very costly and slowdown ingestion
> In all such use cases (immutable data, high write rate and moderate read rates and shorter
ttl), avoiding any compactions and just deleting old data brings lot of performance benefits.
> We should have a compaction policy that can only delete/archive files older than TTL
and not compact any files.
> Also attaching a patch that can do so.

This message was sent by Atlassian JIRA

View raw message