hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Spiegelberg (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2462) Review compaction heuristic and move compaction code out so standalone and independently testable
Date Tue, 26 Oct 2010 00:10:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924792#action_12924792

Nicolas Spiegelberg commented on HBASE-2462:


1. FS default blocksize is the default for a non-custom hlog.blocksize, but they are not necessarily
1-1.   The idea is that new HFiles created should always be <= hlog.blocksize, so we unconditionally
compact for HFiles that have not already been compacted at least once.

2.  The idea behind step #4 is that compaction becomes extremely useful when you can use it
to dedupe.  We should definitely use the compactionThreshold metric here instead of hard-coded
3,   However, I don't think this should be an absolute number of StoreFiles, but rather the
number of relatively-small StoreFiles.  If you have huge region sizes (e.g. large object store),
then you don't mind having 6 storefiles and really just want to compact when it will save
a decent amount of space.

3. This algorithm will perform roughly the same for compacting small/new files; however it
will be more aggressive about including older files in the compaction because it can more
quickly detect when it's advantageous to compact.  Because of the 4x (vs. 2x) multiplier,
it's 2x more scalable and should result in 1/2 the amount of large StoreFiles for large regions.
 For DEFAULT_MAX_FILE_SIZE == 256MB, you should never have more than 5 StoreFiles before triggering
a split.

> Review compaction heuristic and move compaction code out so standalone and independently
> -------------------------------------------------------------------------------------------------
>                 Key: HBASE-2462
>                 URL: https://issues.apache.org/jira/browse/HBASE-2462
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: Jonathan Gray
>            Priority: Critical
> Anything that improves our i/o profile makes hbase run smoother.  Over in HBASE-2457,
good work has been done already describing the tension between minimizing compactions versus
minimizing count of store files.  This issue is about following on from what has been done
in 2457 but also, breaking the hard-to-read compaction code out of Store.java out to a standalone
class that can be the easier tested (and easily analyzed for its performance characteristics).
> If possible, in the refactor, we'd allow specification of alternate merge sort implementations.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message