From user-return-8299-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Fri Aug 06 21:50:15 2010 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 14648 invoked from network); 6 Aug 2010 21:50:15 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Aug 2010 21:50:15 -0000 Received: (qmail 32175 invoked by uid 500); 6 Aug 2010 21:50:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 32134 invoked by uid 500); 6 Aug 2010 21:50:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32126 invoked by uid 99); 6 Aug 2010 21:50:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Aug 2010 21:50:13 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rcoli@digg.com designates 209.85.210.44 as permitted sender) Received: from [209.85.210.44] (HELO mail-pz0-f44.google.com) (209.85.210.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Aug 2010 21:50:03 +0000 Received: by pzk6 with SMTP id 6so3482717pzk.31 for ; Fri, 06 Aug 2010 14:49:40 -0700 (PDT) Received: by 10.114.112.15 with SMTP id k15mr10018603wac.183.1281131380763; Fri, 06 Aug 2010 14:49:40 -0700 (PDT) Received: from Robert-Colis-MacBook-Pro.local (64-71-7-198.static.wiline.com [64.71.7.198]) by mx.google.com with ESMTPS id g4sm3598724wae.14.2010.08.06.14.49.38 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 06 Aug 2010 14:49:39 -0700 (PDT) Message-ID: <4C5C8370.7070107@digg.com> Date: Fri, 06 Aug 2010 14:49:36 -0700 From: Rob Coli User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.11) Gecko/20100711 Thunderbird/3.0.6 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Cassandra disk space utilization WAY higher than I would expect References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 8/5/10 11:51 AM, Peter Schuller wrote: > Also, the variation in disk space in your most recent post looks > entirely as expected to me and nothing really extreme. The temporary > disk space occupied during the compact/cleanup would easily be as high > as your original disk space usage to begin with, and the fact that > you're reaching the 5-7 GB per node level after a cleanup has > completed fully and all obsolete sstables have been removed Your post refers to "obsolete" sstables, but the only thing that makes them "obsolete" in this case is that they have been compacted? As I understand Julie's case, she is : a) initializing her cluster b) inserting some number of unique keys with CL.ALL c) noticing that more disk space (6x?) than is expected is used d) but that she gets expected usage if she does a major compaction In other words, the problem isn't "temporary disk space occupied during the compact", it's permanent disk space occupied unless she compacts. There is clearly overhead from there being multiple SSTables with multiple bloom filters and multiple indexes. But from my understanding, that does not fully account for the difference in disk usage she is seeing. If it is 6x across the whole cluster, it seems unlikely that the meta information is 5x the size of the actual information. I haven't been following this thread very closely, but I don't think "obsolete" SSTables should be relevant, because she's not doing UPDATE or DELETE and she hasn't changed cluster topography (the "cleanup" case). Julie : when compaction occurs, it logs the number of bytes that it started with and the number it ended with, as well as the number of keys involved in the compaction. What do these messages say? example line : INFO [COMPACTION-POOL:1] 2010-08-06 13:48:00,328 CompactionManager.java (line 398) Compacted to /path/to/MyColumnFamily-26-Data.db. 999999999/888888888 bytes for 12345678 keys. Time: 123456ms. =Rob