On Tue, May 1, 2012 at 6:07 PM, Rob Coli <firstname.lastname@example.org> wrote:
The primary differences, as I understand it, are that the index
performance and bloom filter false positive rate for your One Big File
are worse. First, you are more likely to get a bloom filter false
positive due to the intrinsic degradation of bloom filter performance
as number of keys increases. Next, after traversing the SStable index
to get to the closest indexed key, you will be forced to scan past
more keys which are not your key in order to get to the key which is
Fair enough, but if you have a continually growing dataset, then automatic minor compactions would eventually produce SSTables that are as large as the One Big File you created through a major compaction, it just takes a lot longer to get there. So time will "undo" a major compaction and it's definitely not the case that you're forever in some sort of screwed state where you have to manually compact all the time.
I'm also guessing that the wording is just too strong in that part of the documentation, and it would be nice to have a more nuanced piece of advice depending on your traffic pattern.