Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 89926 invoked from network); 29 Mar 2011 10:29:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Mar 2011 10:29:49 -0000 Received: (qmail 55960 invoked by uid 500); 29 Mar 2011 10:29:47 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 55941 invoked by uid 500); 29 Mar 2011 10:29:47 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 55933 invoked by uid 99); 29 Mar 2011 10:29:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 10:29:47 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com designates 74.125.83.44 as permitted sender) Received: from [74.125.83.44] (HELO mail-gw0-f44.google.com) (74.125.83.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 10:29:39 +0000 Received: by gwb20 with SMTP id 20so4347gwb.31 for ; Tue, 29 Mar 2011 03:29:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.151.21.19 with SMTP id y19mr1861980ybi.242.1301394558438; Tue, 29 Mar 2011 03:29:18 -0700 (PDT) Received: by 10.147.32.15 with HTTP; Tue, 29 Mar 2011 03:29:18 -0700 (PDT) X-Originating-IP: [88.183.33.171] In-Reply-To: References: Date: Tue, 29 Mar 2011 12:29:18 +0200 Message-ID: Subject: Re: Compaction doubles disk space From: Sylvain Lebresne To: user@cassandra.apache.org Cc: Sheng Chen Content-Type: text/plain; charset=ISO-8859-1 > BTW, given that compaction requires double disk spaces, does it mean that I > should never reach half of my total disk space? > e.g. if I have 505GB data on 1TB disk, I cannot even delete any data at all. It is not so black and white. What is true is that in practice reaching half the disk should be a first alert, from which you should start to monitor things more carefully to avoid problems. There is 2 kind of compaction, major and minor ones. The major ones are the ones that compact all the sstables for a given column family. Minor compaction are the one that are trigger automatically and regularly. By definition they don't compact everything and thus don't need half your disk space. Note however that over time, even minor compaction will require a fair amount of disk space and could very well require as much as half the disk space, but in practice it won't happen all the time. There other thing is that even a major compaction only have to be applied to one Column Family at a time. So unless you only have one CF or 90% of you data in one CF (and for the record, there's nothing wrong with that, it's just not necessarily your case), you won't need exactly half you disk for a compaction. All this to say that it is not as if as simple as: you've reached half your disk space you are necessarily doomed. Chances are you'll never hit any problem until you're say 70% full (or more). But there is no fullproof number here so I said earlier, hitting 50% should be a first sign that you may need a plan for the future. -- Sylvain