Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of rwille@fold3.com designates
 38.101.149.73 as permitted sender)
User-Agent: Microsoft-MacOutlook/14.3.6.130613
Date: Thu, 28 Nov 2013 20:21:00 -0700
Subject: Recommended amount of free disk space for compaction
From: Robert Wille <rwille@fold3.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Message-ID: <CEBD562C.BF4F2%rwille@fold3.com>
Thread-Topic: Recommended amount of free disk space for compaction
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="B_3468514865_6053530"

--B_3468514865_6053530
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

I=B9m trying to estimate our disk space requirements and I=B9m wondering about
disk space required for compaction.

My application mostly inserts new data and performs updates to existing dat=
a
very infrequently, so there will be very few bytes removed by compaction. I=
t
seems that if a major compaction occurs, that performing the compaction wil=
l
require as much disk space as is currently consumed by the table.

So here=B9s my question. If Cassandra only compacts one table at a time, then
I should be safe if I keep as much free space as there is data in the
largest table. If Cassandra can compact multiple tables simultaneously, the=
n
it seems that I need as much free space as all the tables put together,
which means no more than 50% utilization. So, how much free space do I need=
?
Any rules of thumb anyone can offer?

Also, what happens if a node gets low on disk space and there isn=B9t enough
available for compaction? If I add new nodes to reduce the amount of data o=
n
each node, I assume the space won=B9t be reclaimed until a compaction event
occurs. Is there a way to salvage a node that gets into a state where it
cannot compact its tables?

Thanks

Robert


--B_3468514865_6053530
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

<html><head></head><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: s=
pace; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size:=
 14px; font-family: Calibri, sans-serif;"><div>I&#8217;m trying to estimate =
our disk space requirements and I&#8217;m wondering about disk space require=
d for compaction.</div><div><br></div><div>My application mostly inserts new=
 data and performs updates to existing data very infrequently, so there will=
 be very few bytes removed by compaction. It seems that if a major compactio=
n occurs, that performing the compaction will require as much disk space as =
is currently consumed by the table.&nbsp;</div><div><br></div><div>So here&#=
8217;s my question. If Cassandra only compacts one table at a time, then I s=
hould be safe if I keep as much free space as there is data in the largest t=
able. If Cassandra can compact multiple tables simultaneously, then it seems=
 that I need as much free space as all the tables put together, which means =
no more than 50% utilization. So, how much free space do I need? Any rules o=
f thumb anyone can offer?</div><div><br></div><div>Also, what happens if a n=
ode gets low on disk space and there isn&#8217;t enough available for compac=
tion? If I add new nodes to reduce the amount of data on each node, I assume=
 the space won&#8217;t be reclaimed until a compaction event occurs. Is ther=
e a way to salvage a node that gets into a state where it cannot compact its=
 tables?</div><div><br></div><div>Thanks</div><div><br></div><div>Robert</di=
v><div><br></div></body></html>

--B_3468514865_6053530--