Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 6823 invoked from network); 4 Dec 2009 18:52:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Dec 2009 18:52:12 -0000 Received: (qmail 27457 invoked by uid 500); 4 Dec 2009 18:52:11 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 27432 invoked by uid 500); 4 Dec 2009 18:52:11 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 27423 invoked by uid 99); 4 Dec 2009 18:52:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Dec 2009 18:52:11 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rrabah@playdom.com designates 74.125.149.205 as permitted sender) Received: from [74.125.149.205] (HELO na3sys009aog111.obsmtp.com) (74.125.149.205) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 04 Dec 2009 18:52:02 +0000 Received: from source ([209.85.160.56]) by na3sys009aob111.postini.com ([74.125.148.12]) with SMTP ID DSNKSxlaPIoLdS+/5SCYdcQgALflZPPS7V5s@postini.com; Fri, 04 Dec 2009 10:51:42 PST Received: by mail-pw0-f56.google.com with SMTP id 19so364335pwi.15 for ; Fri, 04 Dec 2009 10:51:40 -0800 (PST) MIME-Version: 1.0 Received: by 10.141.44.17 with SMTP id w17mr213968rvj.67.1259952700613; Fri, 04 Dec 2009 10:51:40 -0800 (PST) In-Reply-To: References: Date: Fri, 4 Dec 2009 10:51:40 -0800 Message-ID: Subject: Re: Removes increasing disk space usage in Cassandra? From: Ramzi Rabah To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I think there might be a bug in the deletion logic. I removed all the data on the cluster by running remove on every single key I entered, and I run major compaction nodeprobe -host hostname compact on a certain node, and after the compaction is over, I am left with one data file/ one index file and the bloom filter file, and they are the same size of data as before I started doing the deletes. On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis wrote: > cassandra never modifies data in-place. =A0so it writes tombstones to > supress the older writes, and when compaction occurs the data and > tombstones get GC'd (after the period specified in your config file). > > On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah wrote: >> Looking at jconsole I see a high number of writes when I do removes, >> so I am guessing these are tombstones being written? If that's the >> case, is the data being removed and replaced by tombstones? and will >> they all be deleted eventually when compaction runs? >> >> >> >> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah wrote: >>> Hi all, >>> >>> I ran a test where I inserted about 1.2 Gigabytes worth of data into >>> each node of a 4 node cluster. >>> I ran a script that first calls a get on each column inserted followed >>> by a remove. Since I was basically removing every entry >>> I inserted before, I expected that the disk space occupied by the >>> nodes will go down and eventually become 0. The disk space >>> actually goes up when I do the bulk removes to about 1.8 gigs per >>> node. Am I missing something here? >>> >>> Thanks a lot for your help >>> Ray >>> >> >