cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2191) Multithread across compaction buckets
Date Thu, 31 Mar 2011 05:32:05 GMT


Stu Hood commented on CASSANDRA-2191:

1. Added {{if (max < min || max < 0) return null;}} check
2. Switched to LinkedHashSet to preserve order
3. Fixed, thanks
4. The idea is that the throttling from CASSANDRA-2156 is a sufficient preventative measure
for this, but it will be important to enable it by default, hopefully using the metrics from
CASSANDRA-2171. I'd prefer not to have two tunables, but it's worth discussing.
5. CompactionManagerMBean exposes getPendingTasks and getCompletedTasks with entirely different
meanings, so I'm not sure we'd gain anything by this
6. (see next comment)
7. Yes, it probably would, but I think that is an issue for a separate ticket. Usually compacting
the smallest bucket first (since the files are likely to be hot in cache) is the biggest win
(which we do): it will be very rare for higher buckets to be more important
8. This would probably be a good idea, for example, if you have more than {{max}} sstables
in the minimum bucket and not enough active threads to parallelize the bucket. Opened CASSANDRA-2407
9. I can't think of an easy way to do this, but if you can, I'm willing

> Multithread across compaction buckets
> -------------------------------------
>                 Key: CASSANDRA-2191
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.8
>         Attachments: 0001-Add-a-compacting-set-to-sstabletracker.txt, 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt,
> This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and reasoning
are different enough to open a separate issue.
> The problem with compactions currently is that they compact the set of sstables that
existed the moment the compaction started. This means that for longer running compactions
(even when running as fast as possible on the hardware), a very large number of new sstables
might be created in the meantime. We have observed this proliferation of sstables killing
performance during major/high-bucketed compactions.
> One approach would be to pause compactions in upper buckets (containing larger files)
when compactions in lower buckets become possible. While this would likely solve the problem
with read performance, it does not actually help us perform compaction any faster, which is
a reasonable requirement for other situations.
> Instead, we need to be able to perform any compactions that are currently required in
parallel, independent of what bucket they might be in.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message