incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Coli <rc...@palominodb.com>
Subject Re: Question regarding major compaction.
Date Tue, 01 May 2012 16:07:51 GMT
On Tue, May 1, 2012 at 4:31 AM, Henrik Schröder <skrolle@gmail.com> wrote:
> But what's the difference between doing an extra read from that One Big
> File, than doing an extra read from whatever SSTable happen to be largest in
> the course of automatic minor compaction?

The primary differences, as I understand it, are that the index
performance and bloom filter false positive rate for your One Big File
are worse. First, you are more likely to get a bloom filter false
positive due to the intrinsic degradation of bloom filter performance
as number of keys increases. Next, after traversing the SStable index
to get to the closest indexed key, you will be forced to scan past
more keys which are not your key in order to get to the key which is
your key.

> So I'm still confused. I don't see a significant difference between doing
> the occasional major compaction or leaving it to do automatic minor
> compactions. What am I missing? Reads will "continually degrade" with
> automatic minor compactions as well, won't they?

I still don't really understand what precisely "continually degrade"
means here either, FWIW, or the two operating paradigms being compared
under what sort of workloads. As a simple example, I don't believe
performance will "continually" do anything if your workload does not
issue logical UPDATE or DELETE to rows. The documentation statement
seems confusingly-vaguely-yet-strongly phrased, even if true.

=Rob

-- 
=Robert Coli
AIM&GTALK - rcoli@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb

Mime
View raw message