incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Boxenhorn <da...@taotown.com>
Subject Re: compaction strategy
Date Mon, 09 May 2011 11:52:38 GMT
If they each have their own copy of the data, then they are *not*
non-overlapping!

If you have non-overlapping SSTables (and you know the min/max keys), it's
like having one big SSTable because you know exactly where each row is, and
it becomes easy to merge a new SSTable in small batches, rather than in one
huge batch.

The only step that you have to add to the current merge process is, when you
going to write a new SSTable, if it's too big, to write N (non-overlapping!)
pieces instead.


On Mon, May 9, 2011 at 12:46 PM, Terje Marthinussen <tmarthinussen@gmail.com
> wrote:

> Yes, agreed.
>
> I actually think cassandra has to.
>
> And if you do not go down to that single file, how do you avoid getting
> into a situation where you can very realistically end up with 4-5 big
> sstables each having its own copy of the same data massively increasing disk
> requirements?
>
> Terje
>
> On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn <david@taotown.com> wrote:
>
>> "I'm also not too much in favor of triggering major compactions, because
>> it mostly have a nasty effect (create one huge sstable)."
>>
>> If that is the case, why can't major compactions create many,
>> non-overlapping SSTables?
>>
>> In general, it seems to me that non-overlapping SSTables have all the
>> advantages of big SSTables (i.e. you know exactly where the data is) without
>> the disadvantages that come with being big. Why doesn't Cassandra take
>> advantage of that in a major way?
>>
>
>

Mime
View raw message