Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 30111 invoked from network); 20 Apr 2010 17:50:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Apr 2010 17:50:34 -0000 Received: (qmail 56807 invoked by uid 500); 20 Apr 2010 17:50:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 56774 invoked by uid 500); 20 Apr 2010 17:50:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 56766 invoked by uid 99); 20 Apr 2010 17:50:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 17:50:33 +0000 X-ASF-Spam-Status: No, hits=0.6 required=10.0 tests=AWL,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.221.192] (HELO mail-qy0-f192.google.com) (209.85.221.192) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 17:50:26 +0000 Received: by qyk30 with SMTP id 30so5545652qyk.16 for ; Tue, 20 Apr 2010 10:50:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.228.138 with HTTP; Tue, 20 Apr 2010 10:50:05 -0700 (PDT) In-Reply-To: <4BCDE57A.5030008@real.com> References: <4BCDE57A.5030008@real.com> Date: Tue, 20 Apr 2010 10:50:05 -0700 Received: by 10.229.230.65 with SMTP id jl1mr6334288qcb.7.1271785805379; Tue, 20 Apr 2010 10:50:05 -0700 (PDT) Message-ID: Subject: Re: cleaning house From: Benjamin Black To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Are you deleting data through the API or just doing a bunch of inserts and then running a compaction? The latter will not result in anything to clean up since data must be explicitly deleted. b On Tue, Apr 20, 2010 at 10:33 AM, B. Todd Burruss wrote= : > i'm trying to draw some correlation between the size of my data and the > space used on disk. =A0i have set 1 so t= here > isn't any reason to keep data around. > > my approach is this: > > after only doing "puts" to cassandra for a while i stop my client and wan= t > to perform the proper "cleanup" and/or "compact" operations that will red= uce > the disk space used to a minimum. =A0however i can't seem to figure it ou= t. > =A0i've done "major compaction", "cleanup", etc. but doesn't seem to get = the > job done > > so two questions > > - what procedure is suggested to get rid of all unnecessary data? > - and what does the following "Compacted" file mean? =A0seams like it is > marking "88" as compacted, but there are no more compactions happening > according to compaction mgr > > -rw-rw-r-- 1 bburruss bburruss =A0 =A0 =A0 =A0 =A00 Apr 20 08:32 bucket-8= 8-Compacted > -rw-rw-r-- 1 bburruss bburruss 1445218042 Apr 19 21:39 bucket-88-Data.db > -rw-rw-r-- 1 bburruss bburruss =A0 12255925 Apr 19 21:39 bucket-88-Filter= .db > -rw-rw-r-- 1 bburruss bburruss =A0451806386 Apr 19 21:39 bucket-88-Index.= db > >