hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Rodionov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14651) Default minimum compaction size is too high
Date Mon, 02 Nov 2015 22:33:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986179#comment-14986179

Vladimir Rodionov commented on HBASE-14651:

With large random values, memstore flush size is dominated by value sizes (non-compressible
in YCSB case), therefore if you have flush size close to minimum compaction size you will
always have compaction selection size (min 3 files) greater than minimum compaction size,
hence the patch won't give you anything.

The good scenario is any Phoenix application (large keys, small values, highly compressible)
or similar. 

> Default minimum compaction size is too high
> -------------------------------------------
>                 Key: HBASE-14651
>                 URL: https://issues.apache.org/jira/browse/HBASE-14651
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>         Attachments: HBASE-14651-v1.patch, HBASE-14651-v2.patch, bytes.png, files.png
> *hbase.hstore.compaction.min.size* defines minimum selection size which is always eligible
for minor compaction (no compaction ratio check is performed on such file selections). Default
size is equals to memstore flush size (128MB).  First of all, even this value is too high
for some (many) deployments, especially for write intensive, because of  a small sizes of
a memstore flushes, and if user increases memstore flush size (they usually set it to at least
256MB), they have no idea how will it impact the overall compaction process efficiency. With
256MB of minimum size to compact, compactor most of the time skips necessary file ratio checks
and this will result in increased read/write IO during compactions, because of the unbalanced
selections where relatively large files can be mixed with a newly created small store files.
I think we should set this default minimum  to 64MB and not to link it to memstore flush size
at all.     

This message was sent by Atlassian JIRA

View raw message