cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1083) Improvement to CompactionManger's submitMinorIfNeeded
Date Tue, 07 Dec 2010 22:19:02 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969043#action_12969043
] 

Tyler Hobbs commented on CASSANDRA-1083:
----------------------------------------

One nice thing about this strategy is that in steady state, you're compacting about 1/target
of your total SSTable data by size.  This gives you a much smoother (and tunable) impact from
compaction.  Recompaction of recently compacted data shouldn't be any more frequent than with
the current strategy; this is especially true since there would no longer be cascading compactions.

Minor nitpick -- compactions happen after every min_compaction_threshold - 1 thresholds, so
a default of 5 instead of 4 might be a good idea.

I think this should be easy to code up.  Jonathan, do you want to me to go ahead with this?

> Improvement to CompactionManger's submitMinorIfNeeded
> -----------------------------------------------------
>
>                 Key: CASSANDRA-1083
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1083
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ryan King
>            Assignee: Tyler Hobbs
>            Priority: Minor
>             Fix For: 0.7.1
>
>         Attachments: 1083-configurable-compaction-thresholds.patch, compaction_simulation.rb
>
>
> We've discovered that we are unable to tune compaction the way we want for our production
cluster. I think the current algorithm doesn't do this as well as it could, since it doesn't
sort the sstables by size before doing the bucketing, which means the tuning parameters have
unpredictable results.
> I looked at CASSANDRA-792, but it seems like overkill. Here's an alternative proposal:
> config operations:
>  minimumCompactionThreshold
>  maximumCompactionThreshold
>  targetSSTableCount
> The first two would mean what they currently mean: the bounds on how many sstables to
compact in one compaction operation. The 3rd is a target for how many SSTables you'd like
to have.
> Pseudo code algorithm for determining whether or not to do a minor compaction:
> {noformat} 
> if sstables.length + minimumCompactionThreshold -1 > targetSSTableCount
>   sort sstables from smallest to largest
>   compact the up to maximumCompactionThreshold smallest tables
> {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message