hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliott Clark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8283) Backport HBASE-7842 Add compaction policy that explores more storefile groups to 0.94
Date Tue, 09 Apr 2013 00:30:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626021#comment-13626021

Elliott Clark commented on HBASE-8283:

There are really three goals.

h2.Goal One
The first goal is to fix bulk load files.  Right now their ordering gets messed up after a
compaction happens.  This leads to some weird compactions where the smallest files are being
compacted with the largest.  This is possible because the compaction policy right now approves
the candidates list as soon as one file is less than or equal to the files after it. The bulk
loaded files are always on the on the left.  The new large file created from compaction does
not have the bulk load flag (thats lost) and it will have a seqId of 0.

h2.Goal Two
The other goal is to only compact files that are all inside of a ratio.  All canidate files
are selected if the is one file to the left that satisfies the ratio SizeFile(j) <= SumFileSize(
j-1, 0).  Workloads where there are large fluctuations can select weird groups of files.

Suppose there's a write work load that's heavily sinusoidal.
[ 1 1  50 150 180 150 50 1 1 1 1 ]

Currently we'd pick 1 1 50 as the files to compact.  
1 1 1 are the most like each other.
150 180 150 are also more similar and would logically be better matches than the ones currently

h2.Goal Three
Just because files are picked doesn't mean they are the best choice.  Right now our compaction
algorithm is pretty naive.  This is a cut at choosing files based on more than one heuristic
(ratio, num files removed, and IO required).

> Backport HBASE-7842 Add compaction policy that explores more storefile groups to 0.94
> -------------------------------------------------------------------------------------
>                 Key: HBASE-8283
>                 URL: https://issues.apache.org/jira/browse/HBASE-8283
>             Project: HBase
>          Issue Type: Task
>          Components: Compaction
>    Affects Versions: 0.94.0
>            Reporter: Elliott Clark
>            Assignee: Elliott Clark
>         Attachments: HBASE-8283-0.patch
> HBASE-7842 Add compaction policy that explores more storefile groups
> Added a new compaction policy that greatly improves selecting files if there are bulk
loaded files.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message