Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of jason.rutherglen@gmail.com
 designates 209.85.160.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAENxBwxwuJ0YiBMMCrUfY8YDOtbkx7O2Vy3KK63wdRTJNJQtzA@mail.gmail.com>
References: <4F98FE29.70007@sitevision.se>
	<CA+NTZAqDw=9N_H17NGdnnS+PePAzu7wC1HBX-oPox3YEr_yLNg@mail.gmail.com>
	<4F99173E.9080801@sitevision.se>
	<7DDDB474-E321-43AB-8BBC-C0EF0C34A916@thelastpickle.com>
	<CAN3fqkx=nKWOdkL793vfiWLC3qLJDyE1gLJcEzKTcSXHLq03Yg@mail.gmail.com>
	<CAENxBwxwuJ0YiBMMCrUfY8YDOtbkx7O2Vy3KK63wdRTJNJQtzA@mail.gmail.com>
Date: Tue, 1 May 2012 07:15:52 -0700
Message-ID: 
 <CACn_-AwTew+8t_bPiXJwOe4pSKWrAEhK9gXT+7by8VL2VtYh4w@mail.gmail.com>
Subject: Re: Question regarding major compaction.
From: Jason Rutherglen <jason.rutherglen@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I wonder if TieredMergePolicy [1] could be used in Cassandra for compaction=
?

1. http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merge=
s.html

On Tue, May 1, 2012 at 6:38 AM, Edward Capriolo <edlinuxguru@gmail.com> wro=
te:
> Henrik,
>
> There are use cases where major compaction works well like yours and
> mine. Essentially cases with a high amount of churn, updates and
> deletes we get a lot of benefit from forced tombstone removal in the
> form of less physical data.
>
> However we end up with really big sstables that naturally will never
> get compacted away since they are so much bigger then the other
> tables. So we get stuck always major compacting forever.
>
> Cassandra needs un compact for people like us so we can turn 1 big
> sstable into multiple smaller ones. Or a major compaction that takes
> in multiple sstables and produces multiple output tables nicely
> organized for bloom filter hits and tombstone free.
>
> Edward
>
> On Tue, May 1, 2012 at 7:31 AM, Henrik Schr=F6der <skrolle@gmail.com> wro=
te:
>> But what's the difference between doing an extra read from that One Big
>> File, than doing an extra read from whatever SSTable happen to be larges=
t in
>> the course of automatic minor compaction?
>>
>> We have a pretty update-heavy application, and doing a major compaction =
can
>> remove up to 30% of the used diskspace. That directly translates into le=
ss
>> reads and less SSTables that rows appear in. Everything that's unchanged
>> since the last major compaction is obviously faster to access, and
>> everything that's changed since the last major compaction is about the s=
ame
>> as if we hadn't done it?
>>
>> So I'm still confused. I don't see a significant difference between doin=
g
>> the occasional major compaction or leaving it to do automatic minor
>> compactions. What am I missing? Reads will "continually degrade" with
>> automatic minor compactions as well, won't they?
>>
>> I can sort of see that if you have a moving active data set, then that w=
ill
>> most probably only exist in the smallest SSTables and frequently be the
>> object of minor compactions, and doing a major compaction will move all =
of
>> it into the biggest SSTables?
>>
>>
>> /Henrik
>>
>> On Mon, Apr 30, 2012 at 05:35, aaron morton <aaron@thelastpickle.com> wr=
ote:
>>>
>>> Depends on your definition of significantly, there are a few things to
>>> consider.
>>>
>>> * Reading from SSTables for a request is a serial operation. Reading fr=
om
>>> 2 SSTables will take twice as long as 1.
>>>
>>> * If the data in the One Big File=99 has been overwritten, reading it i=
s a
>>> waste of time. And it will continue to be read until it the row is comp=
acted
>>> away.
>>>
>>> * You will need to get min_compaction_threshold (CF setting) SSTables t=
hat
>>> big before automatic compaction will pickup the big file.
>>>
>>> On the other side: Some people do report getting value from nightly maj=
or
>>> compactions. They also manage their cluster to reduce the impact of
>>> performing the compactions.
>>>
>>> Hope that helps.
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 26/04/2012, at 9:37 PM, Fredrik wrote:
>>>
>>> Exactly, but why would reads be significantly slower over time when
>>> including just one more, although sometimes large, SSTable in the read?
>>>
>>> Ji Cheng skrev 2012-04-26 11:11:
>>>
>>> I'm also quite interested in this question. Here's my understanding on
>>> this problem.
>>>
>>> 1. If your workload is append-only, doing a major compaction shouldn't
>>> affect the read performance too much, because each row appears in one
>>> sstable anyway.
>>>
>>> 2. If your workload is mostly updating existing rows, then more and mor=
e
>>> columns will be obsoleted in that big sstable created by major compacti=
on.
>>> And that super big sstable won't be compacted until you either have ano=
ther
>>> 3 similar-sized sstables or start another major compaction. But I am no=
t
>>> very sure whether this will be a major problem, because you only end up=
 with
>>> reading one more sstable. Using size-tiered compaction against mostly-u=
pdate
>>> workload itself may result in reading multiple sstables for a single ro=
w
>>> key.
>>>
>>> Please correct me if I am wrong.
>>>
>>> Cheng
>>>
>>>
>>> On Thu, Apr 26, 2012 at 3:50 PM, Fredrik
>>> <fredrik.l.stigback@sitevision.se> wrote:
>>>>
>>>> In the tuning documentation regarding Cassandra, it's recomended not t=
o
>>>> run major compactions.
>>>> I understand what a major compaction is all about but I'd like an in
>>>> depth explanation as to why reads "will continually degrade until the =
next
>>>> major compaction is manually invoked".
>>>>
>>>> From the doc:
>>>> "So while read performance will be good immediately following a major
>>>> compaction, it will continually degrade until the next major compactio=
n is
>>>> manually invoked. For this reason, major compaction is NOT recommended=
 by
>>>> DataStax."
>>>>
>>>> Regards
>>>> /Fredrik
>>>
>>>
>>>
>>>
>>