incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Germán Kondolf <german.kond...@gmail.com>
Subject Re: Parallel Compaction
Date Sun, 19 Dec 2010 01:54:51 GMT
I've created the patch ticket:
https://issues.apache.org/jira/browse/CASSANDRA-1876

On Fri, Dec 17, 2010 at 12:30 PM, Germán Kondolf
<german.kondolf@gmail.com> wrote:
> On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>> On Fri, Dec 17, 2010 at 8:01 AM, Germán Kondolf <german.kondolf@gmail.com>wrote:
>>
>>> Thanks Jonathan for the feedback.
>>>
>>> By flush/schema migration you mean the SSTables replace lock? I've put
>>> that lock just to be sure, if it's fine by you I'll remove it.
>>> I'll clean up the code according to the code-style article, add the
>>> parameter to the configuration using a default of "1" and I'll send it
>>> again.
>>>
>>> Why do you think is only worth it on SSDs?
>>>
>>
>> Because even a single compaction causes a ton of i/o contention.  99% of the
>> time your concern is how to make compaction use _less_ resources, not more.
>> :)
>
> We guess that depending on the scenario there are room for different
> strategies in order to use less resources.
>
> With short lived keys, a parallel fast compaction jointly with
> CASSANDRA-1074 may cause that the node will be compacting for very
> short period of time and while this is happening the other nodes could
> handle the load provided the compaction takes just seconds.
>
> In other scenario, with long lived keys, we're thinking that if the
> minor compaction just compacted the BF and Indexes and leaving the
> SSTables the way they were, we would save the I/O bandwidth we're
> using in write phase, and just writing BF and Indexes.
>
> The proposed structure of SSTables would change an look like this:
> LogicSSTable
>     Index
>     BloomFilter
>     Collection<SSTableOnDisk>
>
> The LogicSSTable contains a the Idx & BF of the given compacted SSTables.
>
> Where reading a column would implied using the BF, reading the index
> which would indicated not only and offset but also a file, and reading
> the corresponding file.
>
> In this way, the minor compaction is just a reading process and not a
> writing intensive process.
>
> Of course, it depends on the behaviour of the dataset. With short
> lived keys, this later strategy just makes the major compaction
> harder. On the other hand, with the current strategy and long lived
> columns, after a while, every column is read and written a lot of
> times just to be left in its original state.
>
> We know that this isn't an easy change, but eventually will try it at
> home, so your critics, warnings and advice are welcome.
>
> Regards.
> --
> //GK
> german.kondolf@gmail.com
> // sites
> http://twitter.com/germanklf
> http://www.facebook.com/germanklf
> http://ar.linkedin.com/in/germankondolf
>



-- 
//GK
german.kondolf@gmail.com
// sites
http://twitter.com/germanklf

Mime
View raw message