Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 38290 invoked from network); 4 Dec 2009 20:35:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Dec 2009 20:35:31 -0000 Received: (qmail 8024 invoked by uid 500); 4 Dec 2009 20:35:31 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 7988 invoked by uid 500); 4 Dec 2009 20:35:31 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 7978 invoked by uid 99); 4 Dec 2009 20:35:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Dec 2009 20:35:31 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 209.85.219.220 as permitted sender) Received: from [209.85.219.220] (HELO mail-ew0-f220.google.com) (209.85.219.220) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Dec 2009 20:35:28 +0000 Received: by ewy20 with SMTP id 20so1134051ewy.0 for ; Fri, 04 Dec 2009 12:35:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=mZeWgoAV5+K+SClPcsUmurGrywGS6qSaiZFIemoNG2M=; b=J0qPG+sltUc7KVyE9QbPb/7nbNSOt1vZ02qHE+ryV0+mylh0OaiirFlnUe7VykCWIV cg3HYiquP9nMQzr1XuzR1p27rD61y6igv0RN0gCrwdAPm9r55eAt+vZM0GD/Cxu43RIJ 5hjT59qQZRbERsYCw7IqcFx3Hwh1ePK85fX7A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=Nl6hro82nGl6MmiZ8iEcN6CWiMCUeN3NSzdGF3QPtYyO82p6YaIVqmRPBkpmq5vLdi b6ZkY4SPbjTrx8EssOAkYxZ6LrpEBQ4tX3tJozU4mZh4RX2OnU8BFF/Ys4kvyuTMoh8K qsP2A6bKbMRQ4nEfJPglPceEfAfdYWHlKpzOo= MIME-Version: 1.0 Received: by 10.216.85.68 with SMTP id t46mr1196392wee.114.1259958907130; Fri, 04 Dec 2009 12:35:07 -0800 (PST) In-Reply-To: References: From: Jonathan Ellis Date: Fri, 4 Dec 2009 14:34:47 -0600 Message-ID: Subject: Re: Removes increasing disk space usage in Cassandra? To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Are you testing trunk? If not, you should check that first to see if it's already fixed. On Fri, Dec 4, 2009 at 1:55 PM, Ramzi Rabah wrote: > Just to be clear what I meant is that I ran the deletions and > compaction with GCGraceSeconds set to 1 hour, so there was enough time > for the tombstones to expire. > Anyway I will try to make a simpler test case to hopefully reproduce > this, and I will share the code if I can reproduce. > > Ray > > On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah wrote: >> Hi Jonathan I have changed that to 3600(one hour) based on your >> recommendation before. >> >> On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis wrot= e: >>> this is what I was referring to by "the period specified in your config= file": >>> >>> =A0 >>> =A0864000 >>> >>> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah wrote= : >>>> I think there might be a bug in the deletion logic. I removed all the >>>> data on the cluster by running remove on every single key I entered, >>>> and I run major compaction >>>> nodeprobe -host hostname compact on a certain node, and after the >>>> compaction is over, I am left with one data file/ one index file and >>>> the bloom filter file, >>>> and they are the same size of data as before I started doing the delet= es. >>>> >>>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis wro= te: >>>>> cassandra never modifies data in-place. =A0so it writes tombstones to >>>>> supress the older writes, and when compaction occurs the data and >>>>> tombstones get GC'd (after the period specified in your config file). >>>>> >>>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah wrot= e: >>>>>> Looking at jconsole I see a high number of writes when I do removes, >>>>>> so I am guessing these are tombstones being written? If that's the >>>>>> case, is the data being removed and replaced by tombstones? and will >>>>>> they all be deleted eventually when compaction runs? >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah wro= te: >>>>>>> Hi all, >>>>>>> >>>>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data int= o >>>>>>> each node of a 4 node cluster. >>>>>>> I ran a script that first calls a get on each column inserted follo= wed >>>>>>> by a remove. Since I was basically removing every entry >>>>>>> I inserted before, I expected that the disk space occupied by the >>>>>>> nodes will go down and eventually become 0. The disk space >>>>>>> actually goes up when I do the bulk removes to about 1.8 gigs per >>>>>>> node. Am I missing something here? >>>>>>> >>>>>>> Thanks a lot for your help >>>>>>> Ray >>>>>>> >>>>>> >>>>> >>>> >>> >> >