incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terje Marthinussen <tmarthinus...@gmail.com>
Subject Re: compaction strategy
Date Mon, 09 May 2011 12:22:24 GMT
Sorry, I was referring to the claim that "one big file" was a problem, not
the non-overlapping part.

If you never compact to a single file, you never get rid of all
generations/duplicates.
With non-overlapping files covering small enough token ranges, compacting
down to one file is not a big issue.

Terje

On Mon, May 9, 2011 at 8:52 PM, David Boxenhorn <david@taotown.com> wrote:

> If they each have their own copy of the data, then they are *not*
> non-overlapping!
>
> If you have non-overlapping SSTables (and you know the min/max keys), it's
> like having one big SSTable because you know exactly where each row is, and
> it becomes easy to merge a new SSTable in small batches, rather than in one
> huge batch.
>
> The only step that you have to add to the current merge process is, when
> you going to write a new SSTable, if it's too big, to write N
> (non-overlapping!) pieces instead.
>
>
> On Mon, May 9, 2011 at 12:46 PM, Terje Marthinussen <
> tmarthinussen@gmail.com> wrote:
>
>> Yes, agreed.
>>
>> I actually think cassandra has to.
>>
>> And if you do not go down to that single file, how do you avoid getting
>> into a situation where you can very realistically end up with 4-5 big
>> sstables each having its own copy of the same data massively increasing disk
>> requirements?
>>
>> Terje
>>
>> On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn <david@taotown.com>wrote:
>>
>>> "I'm also not too much in favor of triggering major compactions, because
>>> it mostly have a nasty effect (create one huge sstable)."
>>>
>>> If that is the case, why can't major compactions create many,
>>> non-overlapping SSTables?
>>>
>>> In general, it seems to me that non-overlapping SSTables have all the
>>> advantages of big SSTables (i.e. you know exactly where the data is) without
>>> the disadvantages that come with being big. Why doesn't Cassandra take
>>> advantage of that in a major way?
>>>
>>
>>
>

Mime
View raw message