cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henrik Schröder <skro...@gmail.com>
Subject Re: Question regarding major compaction.
Date Wed, 02 May 2012 19:17:56 GMT
On Tue, May 1, 2012 at 6:07 PM, Rob Coli <rcoli@palominodb.com> wrote:

>
> The primary differences, as I understand it, are that the index
> performance and bloom filter false positive rate for your One Big File
> are worse. First, you are more likely to get a bloom filter false
> positive due to the intrinsic degradation of bloom filter performance
> as number of keys increases. Next, after traversing the SStable index
> to get to the closest indexed key, you will be forced to scan past
> more keys which are not your key in order to get to the key which is
> your key.
>
>
Fair enough, but if you have a continually growing dataset, then automatic
minor compactions would eventually produce SSTables that are as large as
the One Big File you created through a major compaction, it just takes a
lot longer to get there. So time will "undo" a major compaction and it's
definitely not the case that you're forever in some sort of screwed state
where you have to manually compact all the time.

I'm also guessing that the wording is just too strong in that part of the
documentation, and it would be nice to have a more nuanced piece of advice
depending on your traffic pattern.


/Henrik

Mime
View raw message