cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Question regarding major compaction.
Date Mon, 30 Apr 2012 03:35:07 GMT
Depends on your definition of significantly, there are a few things to consider. 

* Reading from SSTables for a request is a serial operation. Reading from 2 SSTables will
take twice as long as 1. 

* If the data in the One Big Fileā„¢ has been overwritten, reading it is a waste of time.
And it will continue to be read until it the row is compacted away. 

* You will need to get min_compaction_threshold (CF setting) SSTables that big before automatic
compaction will pickup the big file. 

On the other side: Some people do report getting value from nightly major compactions. They
also manage their cluster to reduce the impact of performing the compactions.

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/04/2012, at 9:37 PM, Fredrik wrote:

> Exactly, but why would reads be significantly slower over time when including just one
more, although sometimes large, SSTable in the read?
> 
> Ji Cheng skrev 2012-04-26 11:11:
>> 
>> I'm also quite interested in this question. Here's my understanding on this problem.
>> 
>> 1. If your workload is append-only, doing a major compaction shouldn't affect the
read performance too much, because each row appears in one sstable anyway. 
>> 
>> 2. If your workload is mostly updating existing rows, then more and more columns
will be obsoleted in that big sstable created by major compaction. And that super big sstable
won't be compacted until you either have another 3 similar-sized sstables or start another
major compaction. But I am not very sure whether this will be a major problem, because you
only end up with reading one more sstable. Using size-tiered compaction against mostly-update
workload itself may result in reading multiple sstables for a single row key. 
>> 
>> Please correct me if I am wrong.
>> 
>> Cheng
>> 
>> 
>> On Thu, Apr 26, 2012 at 3:50 PM, Fredrik <fredrik.l.stigback@sitevision.se>
wrote:
>> In the tuning documentation regarding Cassandra, it's recomended not to run major
compactions.
>> I understand what a major compaction is all about but I'd like an in depth explanation
as to why reads "will continually degrade until the next major compaction is manually invoked".
>> 
>> From the doc:
>> "So while read performance will be good immediately following a major compaction,
it will continually degrade until the next major compaction is manually invoked. For this
reason, major compaction is NOT recommended by DataStax."
>> 
>> Regards
>> /Fredrik
>> 
> 


Mime
View raw message