cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Resolved] (CASSANDRA-1881) support concurrent "tiered" compaction
Date Tue, 15 May 2012 18:19:21 GMT


Jonathan Ellis resolved CASSANDRA-1881.

    Resolution: Won't Fix

Concurrent compactions was added in CASSANDRA-2191.  I see small benefit (and a lot of complexity)
to be gained by rewriting to basically a pool of async compaction threads.
> support concurrent "tiered" compaction
> --------------------------------------
>                 Key: CASSANDRA-1881
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Priority: Minor
> (this has been discussed on the ML:s before; I am filing it now so that there is a ticket
to refer to on the wiki)
> CASSANDRA-1876 is open to allow parallel compaction for the purpose of throughput. However,
that only addresses one aspect of why parallel compaction is useful; the other half is ensuring
that compaction is proceeding in a timely fashion at each "size tier" (for lack of a better
> Essentially, CASSANDRA-1876 is about CPU concurrency while this is about functional concurrency.
I propose that compaction be a process which performs some amount of compaction work per second
(I'm thinking ahead to future rate limiting; that's another ticket to be filed). That work
has to be spread out over multiple compaction tiers in a way that is not coupled with CPU
> Suggested solution is to have N number of concurrent compaction threads going at any
given moment (CASSANDRA-1876), but to have those compaction threads perform work for a variable
number of compaction jobs. Compactions would be triggered according to similarly sized sstables
as before, but each such compaction would be a compaction "job" that is independent of any
actual compaction thread.
> Compaction threads move between compaction jobs at a coarse granularity so that synchronization
overhead is irrelevant (for example it might go and look for other work to do every memtable_throughput_in_mb
megabytes). Smaller compaction jobs take priority over larger jobs. This is intended to keep
sstable counts down, and always leave the larger jobs as the ones having to wait given that
they are not latency sensitive anyway due to their size.
> The primary downside is that disk usage spikes would much more easily reach "double cf
size" levels when many compactions are running. This is probably something that can be mitigated
by CASSANDRA-1608 with its talk of limited sstable sizes.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message