hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Spiegelberg (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2462) Review compaction heuristic and move compaction code out so standalone and independently testable
Date Mon, 25 Oct 2010 18:22:22 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924664#action_12924664
] 

Nicolas Spiegelberg commented on HBASE-2462:
--------------------------------------------

So, we've been talking about a new compaction algorithm internally and wanted to get external
feedback as well...

The existing store file selection algorithm seems to not utilize enough context.  We start
at the oldest and compact everything else when it's no longer 2x the next oldest.  It seems
like we want to approach from the opposite direction:

1. Start at the newest file
2. Unconditionally compact as long as the StoreFiles are less than a certain
size (thinking "hbase.regionserver.hlog.blocksize").
3. After that metric has been met,  if next oldest file < sum(all newer files) * R, we
include it in the compaction.  R = 2.
4. If files-to-compact < max(HColumnDescriptor.maxVersions(),3), skip the compaction

This algorithm can serve a very generic workload.  Axiom: It's worth compacting if sum(files)
>= 150% * max(files).  Maybe make this adjustable.  The main point is that the ratio between
file[i], file[i+1] is less useful than sum(files), max(files).

A. With files[i] < files[i+1] * 2, our worst case ends up with a decreasing triangle of
2x.
B. With files[i] < sum(files[0..i-1]) * 2, we are dealing with the derivative.  Our worst
case ends up with decreasing triangle of 4x

With a 4x ratio & 64 MB hlog blocksize, we could support up to a 21.4GB Store while using
less than 8 files.  3 minimal threshold fiels + 5 worst case files that would be roughly:
64MB, 256MB, 1GB, 4GB, 16GB == 21.3GB.  Assuming that the average user has a 1-2 GB store,
the number of HFiles should never get above 6.


> Review compaction heuristic and move compaction code out so standalone and independently
testable
> -------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2462
>                 URL: https://issues.apache.org/jira/browse/HBASE-2462
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: Jonathan Gray
>            Priority: Critical
>
> Anything that improves our i/o profile makes hbase run smoother.  Over in HBASE-2457,
good work has been done already describing the tension between minimizing compactions versus
minimizing count of store files.  This issue is about following on from what has been done
in 2457 but also, breaking the hard-to-read compaction code out of Store.java out to a standalone
class that can be the easier tested (and easily analyzed for its performance characteristics).
> If possible, in the refactor, we'd allow specification of alternate merge sort implementations.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message