incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Germán Kondolf <>
Subject Re: Parallel Compaction
Date Fri, 17 Dec 2010 15:30:37 GMT
On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Ellis <> wrote:
> On Fri, Dec 17, 2010 at 8:01 AM, Germán Kondolf <>wrote:
>> Thanks Jonathan for the feedback.
>> By flush/schema migration you mean the SSTables replace lock? I've put
>> that lock just to be sure, if it's fine by you I'll remove it.
>> I'll clean up the code according to the code-style article, add the
>> parameter to the configuration using a default of "1" and I'll send it
>> again.
>> Why do you think is only worth it on SSDs?
> Because even a single compaction causes a ton of i/o contention.  99% of the
> time your concern is how to make compaction use _less_ resources, not more.
> :)

We guess that depending on the scenario there are room for different
strategies in order to use less resources.

With short lived keys, a parallel fast compaction jointly with
CASSANDRA-1074 may cause that the node will be compacting for very
short period of time and while this is happening the other nodes could
handle the load provided the compaction takes just seconds.

In other scenario, with long lived keys, we're thinking that if the
minor compaction just compacted the BF and Indexes and leaving the
SSTables the way they were, we would save the I/O bandwidth we're
using in write phase, and just writing BF and Indexes.

The proposed structure of SSTables would change an look like this:

The LogicSSTable contains a the Idx & BF of the given compacted SSTables.

Where reading a column would implied using the BF, reading the index
which would indicated not only and offset but also a file, and reading
the corresponding file.

In this way, the minor compaction is just a reading process and not a
writing intensive process.

Of course, it depends on the behaviour of the dataset. With short
lived keys, this later strategy just makes the major compaction
harder. On the other hand, with the current strategy and long lived
columns, after a while, every column is read and written a lot of
times just to be left in its original state.

We know that this isn't an easy change, but eventually will try it at
home, so your critics, warnings and advice are welcome.

// sites

View raw message