cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2559) Distinguish long and short running compactions
Date Tue, 26 Apr 2011 16:49:03 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025316#comment-13025316
] 

Peter Schuller commented on CASSANDRA-2559:
-------------------------------------------

This may intersect with the solution to avoiding AES on small CF:s haing to wait for huge
long-running AES jobs on large CF:s. I didn't file that because I was going to figure out
whether the concurrent compaction work already addressed it. I take it that it doesn't, but
this would help.

So, that's another potential motivation.

> Distinguish long and short running compactions
> ----------------------------------------------
>
>                 Key: CASSANDRA-2559
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2559
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>              Labels: compaction
>
> Unless you have SSD, multi-threaded compaction is mainly here to avoid accumulating lots
of newly flushed sstables while a long lasting compaction is running. But too many concurrent
compactions are bad for random IO. CASSANDRA-2558 will allow to limit the number of such concurrent
compactions, but choosing the right number there is not easy. If you pick too low a number,
you risk accumulating "young" sstables if 2 or 3 fairly long compaction runs at the same time.
On the other side, compacting multiple "small" sstables is likely to be less efficient (on
a spinning disk) than compacting them serially.
> It seems to me we could have the best of both world by distinguishing long and short
compactions. We could have 2 pools of thread, one for long compaction (whatever the exact
definition is) and one for short ones. With this, even with one thread in each pool you would
avoid most of the 'new sstable accumulation' problem while making sure you never run too many
concurrent compactions (note that in theory we could stratify further than "short" and "long",
but I'm not sure the benefits would out-weigh the added complexity).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message