Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 26313 invoked from network); 4 Dec 2009 19:56:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Dec 2009 19:56:09 -0000 Received: (qmail 35746 invoked by uid 500); 4 Dec 2009 19:56:08 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 35736 invoked by uid 500); 4 Dec 2009 19:56:08 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 35727 invoked by uid 99); 4 Dec 2009 19:56:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Dec 2009 19:56:08 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rrabah@playdom.com designates 74.125.149.201 as permitted sender) Received: from [74.125.149.201] (HELO na3sys009aog109.obsmtp.com) (74.125.149.201) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 04 Dec 2009 19:55:59 +0000 Received: from source ([209.85.216.195]) by na3sys009aob109.postini.com ([74.125.148.12]) with SMTP ID DSNKSxlpOXUgZF42XcCeoFBKsX6bpSePomQ3@postini.com; Fri, 04 Dec 2009 11:55:38 PST Received: by pxi33 with SMTP id 33so590102pxi.10 for ; Fri, 04 Dec 2009 11:55:37 -0800 (PST) MIME-Version: 1.0 Received: by 10.141.19.5 with SMTP id w5mr199997rvi.250.1259956537645; Fri, 04 Dec 2009 11:55:37 -0800 (PST) In-Reply-To: References: Date: Fri, 4 Dec 2009 11:55:37 -0800 Message-ID: Subject: Re: Removes increasing disk space usage in Cassandra? From: Ramzi Rabah To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Just to be clear what I meant is that I ran the deletions and compaction with GCGraceSeconds set to 1 hour, so there was enough time for the tombstones to expire. Anyway I will try to make a simpler test case to hopefully reproduce this, and I will share the code if I can reproduce. Ray On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah wrote: > Hi Jonathan I have changed that to 3600(one hour) based on your > recommendation before. > > On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis wrote= : >> this is what I was referring to by "the period specified in your config = file": >> >> =A0 >> =A0864000 >> >> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah wrote: >>> I think there might be a bug in the deletion logic. I removed all the >>> data on the cluster by running remove on every single key I entered, >>> and I run major compaction >>> nodeprobe -host hostname compact on a certain node, and after the >>> compaction is over, I am left with one data file/ one index file and >>> the bloom filter file, >>> and they are the same size of data as before I started doing the delete= s. >>> >>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis wrot= e: >>>> cassandra never modifies data in-place. =A0so it writes tombstones to >>>> supress the older writes, and when compaction occurs the data and >>>> tombstones get GC'd (after the period specified in your config file). >>>> >>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah wrote= : >>>>> Looking at jconsole I see a high number of writes when I do removes, >>>>> so I am guessing these are tombstones being written? If that's the >>>>> case, is the data being removed and replaced by tombstones? and will >>>>> they all be deleted eventually when compaction runs? >>>>> >>>>> >>>>> >>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah wrot= e: >>>>>> Hi all, >>>>>> >>>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data into >>>>>> each node of a 4 node cluster. >>>>>> I ran a script that first calls a get on each column inserted follow= ed >>>>>> by a remove. Since I was basically removing every entry >>>>>> I inserted before, I expected that the disk space occupied by the >>>>>> nodes will go down and eventually become 0. The disk space >>>>>> actually goes up when I do the bulk removes to about 1.8 gigs per >>>>>> node. Am I missing something here? >>>>>> >>>>>> Thanks a lot for your help >>>>>> Ray >>>>>> >>>>> >>>> >>> >> >