Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 30830 invoked from network); 10 Jan 2011 14:00:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Jan 2011 14:00:47 -0000 Received: (qmail 41381 invoked by uid 500); 10 Jan 2011 14:00:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 41243 invoked by uid 500); 10 Jan 2011 14:00:44 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 41235 invoked by uid 99); 10 Jan 2011 14:00:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jan 2011 14:00:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shimi.k@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jan 2011 14:00:36 +0000 Received: by yxt33 with SMTP id 33so8493212yxt.31 for ; Mon, 10 Jan 2011 06:00:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=RjYcEthLw1sqE2vGyVnHUH31AkGVKuyd0D8+UWiaD00=; b=w2focEAHdSBmXqbgpvaJEXP7i5puQQfbO6HNAvSc5njAmfeT5geT/cX7AFIvaZZUHc weBQSAwO+MCm1TxgyPo5rSdT171x+KP0O21Ie7+9l2QDtH15q5Vt/KC5XXIq2nM9aJu0 PwiOVTKiy8Pt2ynsUIe5Es8Z3HUlM0b77vYx8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=muW5+iYA6k9fiWewKJ4nfts/iHrfO49Sx1t8K1bM2IQw7cVmpSEQjAH9lcqd1F34os 7piuIxuG8RVsRIjIpvfdoPG/ej/lQOY4p8VVS0L1YYPretw0ETxAz+ph3UQbYqy7g8u3 8pJZtCVo2vt3SRj2WluJ9Vfpa2d5AVPLjUZDA= MIME-Version: 1.0 Received: by 10.90.248.40 with SMTP id v40mr6184829agh.19.1294668014351; Mon, 10 Jan 2011 06:00:14 -0800 (PST) Received: by 10.91.192.15 with HTTP; Mon, 10 Jan 2011 06:00:14 -0800 (PST) In-Reply-To: References: Date: Mon, 10 Jan 2011 16:00:14 +0200 Message-ID: Subject: Re: Reclaim deleted rows space From: shimi To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016363b80ae18ffc704997e636b X-Virus-Checked: Checked by ClamAV on apache.org --0016363b80ae18ffc704997e636b Content-Type: text/plain; charset=ISO-8859-1 I modified the code to limit the size of the SSTables. I will be glad if someone can take a look at it https://github.com/Shimi/cassandra/tree/cassandra-0.6 Shimi On Fri, Jan 7, 2011 at 2:04 AM, Jonathan Shook wrote: > I believe the following condition within submitMinorIfNeeded(...) > determines whether to continue, so it's not a hard loop. > > // if (sstables.size() >= minThreshold) ... > > > > On Thu, Jan 6, 2011 at 2:51 AM, shimi wrote: > > According to the code it make sense. > > submitMinorIfNeeded() calls doCompaction() which > > calls submitMinorIfNeeded(). > > With minimumCompactionThreshold = 1 submitMinorIfNeeded() will always run > > compaction. > > > > Shimi > > On Thu, Jan 6, 2011 at 10:26 AM, shimi wrote: > >> > >> > >> On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis > wrote: > >>> > >>> Pretty sure there's logic in there that says "don't bother compacting > >>> a single sstable." > >> > >> No. You can do it. > >> Based on the log I have a feeling that it triggers an infinite > compaction > >> loop. > >> > >>> > >>> On Wed, Jan 5, 2011 at 2:26 PM, shimi wrote: > >>> > How does minor compaction is triggered? Is it triggered Only when a > new > >>> > SStable is added? > >>> > > >>> > I was wondering if triggering a compaction > >>> > with minimumCompactionThreshold > >>> > set to 1 would be useful. If this can happen I assume it will do > >>> > compaction > >>> > on files with similar size and remove deleted rows on the rest. > >>> > Shimi > >>> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller > >>> > > >>> > wrote: > >>> >> > >>> >> > I don't have a problem with disk space. I have a problem with the > >>> >> > data > >>> >> > size. > >>> >> > >>> >> [snip] > >>> >> > >>> >> > Bottom line is that I want to reduce the number of requests that > >>> >> > goes to > >>> >> > disk. Since there is enough data that is no longer valid I can do > it > >>> >> > by > >>> >> > reclaiming the space. The only way to do it is by running Major > >>> >> > compaction. > >>> >> > I can wait and let Cassandra do it for me but then the data size > >>> >> > will > >>> >> > get > >>> >> > even bigger and the response time will be worst. I can do it > >>> >> > manually > >>> >> > but I > >>> >> > prefer it to happen in the background with less impact on the > system > >>> >> > >>> >> Ok - that makes perfect sense then. Sorry for misunderstanding :) > >>> >> > >>> >> So essentially, for workloads that are teetering on the edge of > cache > >>> >> warmness and is subject to significant overwrites or removals, it > may > >>> >> be beneficial to perform much more aggressive background compaction > >>> >> even though it might waste lots of CPU, to keep the in-memory > working > >>> >> set down. > >>> >> > >>> >> There was talk (I think in the compaction redesign ticket) about > >>> >> potentially improving the use of bloom filters such that obsolete > data > >>> >> in sstables could be eliminated from the read set without > >>> >> necessitating actual compaction; that might help address cases like > >>> >> these too. > >>> >> > >>> >> I don't think there's a pre-existing silver bullet in a current > >>> >> release; you probably have to live with the need for > >>> >> greater-than-theoretically-optimal memory requirements to keep the > >>> >> working set in memory. > >>> >> > >>> >> -- > >>> >> / Peter Schuller > >>> > > >>> > > >>> > >>> > >>> > >>> -- > >>> Jonathan Ellis > >>> Project Chair, Apache Cassandra > >>> co-founder of Riptano, the source for professional Cassandra support > >>> http://riptano.com > >> > > > > > --0016363b80ae18ffc704997e636b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I modified the code to limit the size of the SSTables.I will be glad if someone can take a look at it

<= meta http-equiv=3D"content-type" content=3D"text/html; charset=3Dutf-8">https://gith= ub.com/Shimi/cassandra/tree/cassandra-0.6

Shimi

On Fri, Jan 7, 2011 = at 2:04 AM, Jonathan Shook <jshook@gmail.com> wrote:
I believe the following condition within su= bmitMinorIfNeeded(...)
determines whether to continue, so it's not a hard loop.

// if (sstables.size() >=3D minThreshold) ...



On Thu, Jan 6, 2011 at 2:51 AM, shimi <shimi.k@gmail.com> wrote:
> According to the code it make sense.
> submitMinorIfNeeded() calls doCompaction() which
> calls=A0submitMinorIfNeeded().
> With=A0minimumCompactionThreshold =3D 1=A0submitMinorIfNeeded() will a= lways run
> compaction.
>
> Shimi
> On Thu, Jan 6, 2011 at 10:26 AM, shimi <shimi.k@gmail.com> wrote:
>>
>>
>> On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>>
>>> Pretty sure there's logic in there that says "don'= ;t bother compacting
>>> a single sstable."
>>
>> No. You can do it.
>> Based on the log I have a feeling that it triggers an infinite com= paction
>> loop.
>>
>>>
>>> On Wed, Jan 5, 2011 at 2:26 PM, shimi <shimi.k@gmail.com> wrote:
>>> > How does minor compaction is triggered? Is it triggered O= nly when a new
>>> > SStable is added?
>>> >
>>> > I was wondering if triggering a compaction
>>> > with=A0minimumCompactionThreshold
>>> > set to 1 would be useful. If this can happen I assume it = will do
>>> > compaction
>>> > on files with similar size and remove deleted rows on the= rest.
>>> > Shimi
>>> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller
>>> > <peter.= schuller@infidyne.com>
>>> > wrote:
>>> >>
>>> >> > I don't have a problem with disk space. I ha= ve a problem with the
>>> >> > data
>>> >> > size.
>>> >>
>>> >> [snip]
>>> >>
>>> >> > Bottom line is that I want to reduce the number = of requests that
>>> >> > goes to
>>> >> > disk. Since there is enough data that is no long= er valid=A0I can do it
>>> >> > by
>>> >> > reclaiming the space. The only way to do it is b= y running Major
>>> >> > compaction.
>>> >> > I can wait and let Cassandra do it for me but th= en the data size
>>> >> > will
>>> >> > get
>>> >> > even bigger and the response time will be worst.= I can do it
>>> >> > manually
>>> >> > but I
>>> >> > prefer it to happen in the background with less = impact on the system
>>> >>
>>> >> Ok - that makes perfect sense then. Sorry for misunde= rstanding :)
>>> >>
>>> >> So essentially, for workloads that are teetering on t= he edge of cache
>>> >> warmness and is subject to significant overwrites or = removals, it may
>>> >> be beneficial to perform much more aggressive backgro= und compaction
>>> >> even though it might waste lots of CPU, to keep the i= n-memory working
>>> >> set down.
>>> >>
>>> >> There was talk (I think in the compaction redesign ti= cket) about
>>> >> potentially improving the use of bloom filters such t= hat obsolete data
>>> >> in sstables could be eliminated from the read set wit= hout
>>> >> necessitating actual compaction; that might help addr= ess cases like
>>> >> these too.
>>> >>
>>> >> I don't think there's a pre-existing silver b= ullet in a current
>>> >> release; you probably have to live with the need for<= br> >>> >> greater-than-theoretically-optimal memory requirement= s to keep the
>>> >> working set in memory.
>>> >>
>>> >> --
>>> >> / Peter Schuller
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra s= upport
>>> http://riptan= o.com
>>
>
>

--0016363b80ae18ffc704997e636b--