Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F1A7F10381 for ; Fri, 29 Nov 2013 03:21:36 +0000 (UTC) Received: (qmail 70698 invoked by uid 500); 29 Nov 2013 03:21:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 70307 invoked by uid 500); 29 Nov 2013 03:21:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 70297 invoked by uid 99); 29 Nov 2013 03:21:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Nov 2013 03:21:31 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rwille@fold3.com designates 38.101.149.73 as permitted sender) Received: from [38.101.149.73] (HELO mx02.iarchives.com) (38.101.149.73) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Nov 2013 03:21:27 +0000 Received: from mx02.iarchives.com (localhost [127.0.0.1]) by mx02.iarchives.com (Postfix) with ESMTP id 5DFD9C13AF for ; Thu, 28 Nov 2013 20:21:06 -0700 (MST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=fold3.com; h=date :subject:from:to:message-id:mime-version:content-type; s=m1; bh= A57ZuwDjLSDWB++VZHX7icrWrZfT263G5N7OMB6EkyM=; b=gusl/q2xiuQmZsMq YkGvHYqz8qJqlUbBfZtPXa6QRiebMzs2j3g27i/O5jlA9zBaWxpZOVNsUH+kBh0E CxxcbdXd0u8rne+sJ1o2vRf5U2xWLdtCYlLzFGZnV43+5a4o6dD446eWlCTNef80 8yRXxI3qMgecGnCzyFkn5SIKxhY= Received: from PANDORA.iarchives.com (pandora.iarchives.com [192.168.100.88]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx02.iarchives.com (Postfix) with ESMTPS id 35363C13AE for ; Thu, 28 Nov 2013 20:21:06 -0700 (MST) Received: from [10.88.88.10] (192.168.97.117) by PANDORA.iarchives.com (192.168.100.88) with Microsoft SMTP Server (TLS) id 14.1.438.0; Thu, 28 Nov 2013 20:21:26 -0700 User-Agent: Microsoft-MacOutlook/14.3.6.130613 Date: Thu, 28 Nov 2013 20:21:00 -0700 Subject: Recommended amount of free disk space for compaction From: Robert Wille To: "user@cassandra.apache.org" Message-ID: Thread-Topic: Recommended amount of free disk space for compaction MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="B_3468514865_6053530" X-Virus-Checked: Checked by ClamAV on apache.org --B_3468514865_6053530 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable I=B9m trying to estimate our disk space requirements and I=B9m wondering about disk space required for compaction. My application mostly inserts new data and performs updates to existing dat= a very infrequently, so there will be very few bytes removed by compaction. I= t seems that if a major compaction occurs, that performing the compaction wil= l require as much disk space as is currently consumed by the table. So here=B9s my question. If Cassandra only compacts one table at a time, then I should be safe if I keep as much free space as there is data in the largest table. If Cassandra can compact multiple tables simultaneously, the= n it seems that I need as much free space as all the tables put together, which means no more than 50% utilization. So, how much free space do I need= ? Any rules of thumb anyone can offer? Also, what happens if a node gets low on disk space and there isn=B9t enough available for compaction? If I add new nodes to reduce the amount of data o= n each node, I assume the space won=B9t be reclaimed until a compaction event occurs. Is there a way to salvage a node that gets into a state where it cannot compact its tables? Thanks Robert --B_3468514865_6053530 Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable
I’m trying to estimate = our disk space requirements and I’m wondering about disk space require= d for compaction.

My application mostly inserts new= data and performs updates to existing data very infrequently, so there will= be very few bytes removed by compaction. It seems that if a major compactio= n occurs, that performing the compaction will require as much disk space as = is currently consumed by the table. 

So here&#= 8217;s my question. If Cassandra only compacts one table at a time, then I s= hould be safe if I keep as much free space as there is data in the largest t= able. If Cassandra can compact multiple tables simultaneously, then it seems= that I need as much free space as all the tables put together, which means = no more than 50% utilization. So, how much free space do I need? Any rules o= f thumb anyone can offer?

Also, what happens if a n= ode gets low on disk space and there isn’t enough available for compac= tion? If I add new nodes to reduce the amount of data on each node, I assume= the space won’t be reclaimed until a compaction event occurs. Is ther= e a way to salvage a node that gets into a state where it cannot compact its= tables?

Thanks

Robert

--B_3468514865_6053530--