From cassandra-user-return-1665-apmail-incubator-cassandra-user-archive=incubator.apache.org@incubator.apache.org Sat Dec 05 00:13:52 2009 Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 17638 invoked from network); 5 Dec 2009 00:13:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Dec 2009 00:13:51 -0000 Received: (qmail 73437 invoked by uid 500); 5 Dec 2009 00:13:51 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 73427 invoked by uid 500); 5 Dec 2009 00:13:50 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 73418 invoked by uid 99); 5 Dec 2009 00:13:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Dec 2009 00:13:50 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rrabah@playdom.com designates 74.125.149.77 as permitted sender) Received: from [74.125.149.77] (HELO na3sys009aog106.obsmtp.com) (74.125.149.77) by apache.org (qpsmtpd/0.29) with SMTP; Sat, 05 Dec 2009 00:13:47 +0000 Received: from source ([209.85.160.43]) by na3sys009aob106.postini.com ([74.125.148.12]) with SMTP ID DSNKSxmlp2MXzHs4JCxT0tlb0jAFQF7jJtPX@postini.com; Fri, 04 Dec 2009 16:13:27 PST Received: by mail-pw0-f43.google.com with SMTP id 12so2609163pwi.2 for ; Fri, 04 Dec 2009 16:13:27 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.172.6 with SMTP id u6mr217803rve.207.1259972006952; Fri, 04 Dec 2009 16:13:26 -0800 (PST) In-Reply-To: References: Date: Fri, 4 Dec 2009 16:13:26 -0800 Message-ID: Subject: Re: Removes increasing disk space usage in Cassandra? From: Ramzi Rabah To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Done https://issues.apache.org/jira/browse/CASSANDRA-604 On Fri, Dec 4, 2009 at 4:01 PM, Jonathan Ellis wrote: > Please do. > > On Fri, Dec 4, 2009 at 5:53 PM, Ramzi Rabah wrote: >> Thanks Jonathan. >> Should I open a bug for this? >> >> Ray >> >> On Fri, Dec 4, 2009 at 3:47 PM, Jonathan Ellis wrote= : >>> On Fri, Dec 4, 2009 at 5:32 PM, Ramzi Rabah wrote: >>>> Starting with fresh directories with no data and trying to do simple >>>> inserts, I could not reproduce it *sigh*. Nothing is simple :(, so I >>>> decided to dig deeper into the code. >>>> >>>> I was looking at the code for compaction, and this is a very noob >>>> concern, so please bare with me if I'm way off, this code is all new >>>> to me. When we are doing compactions during the normal course of >>>> cassandra, we call: >>>> >>>> =A0 =A0 =A0 =A0 =A0 =A0for (List sstables : >>>> getCompactionBuckets(ssTables_, 50L * 1024L * 1024L)) >>>> =A0 =A0 =A0 =A0 =A0 =A0{ >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (sstables.size() < minThreshold) >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{ >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue; >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} >>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0other wise docompactions... >>>> >>>> where getCompactionBuckets puts in buckets very small files, or files >>>> that are 0.5-1.5 of each other's sizes. It will only compact those if >>>> they are >=3D minimum threshold which is 4 by default. >>> >>> Exactly right. >>> >>>> So far so good. Now how about this scenario, I have an old entry that >>>> I inserted long time ago and that was compacted into a 75MB file. >>>> There are fewer 75MB files than 4. I do many deletes, and I end with 4 >>>> extra sstable files filled with tombstones, each about 300 MB large. >>>> These 4 files are compacted together and in the compaction code, if >>>> the tombstone is there we don't copy it over to the new file. Now >>>> since we did not compact the 75MB files, but we compacted the >>>> tombstone files, doesn't that leave us with the tombstone gone, but >>>> the data still intact in the 75MB file? >>> >>> Also right. =A0Glad you had a look! :) >>> >>> One relatively easy fix would be to only GC the tombstones if there >>> are no SSTables left for that CF older than the ones being compacted. >>> (So, a "major" compaction, which compacts all SSTables and is what >>> nodeprobe invokes, would always GC eligible tombstones.) >>> >>> -Jonathan >>> >> >