cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Soboroff <>
Subject Capacity planning and Re: Handling disk-full scenarios
Date Wed, 02 Jun 2010 17:59:20 GMT
Reading some more (someone break in when I lose my clue ;-)

Reading the streams page in the wiki about anticompaction, I think the best
approach to take when a node gets its disks overfull, is to set the
compaction thresholds to 0 on all nodes, decommission the overfull node,
wait for stuff to get redistributed, and then clean off the decommissioned
node and bootstrap it.  Since the disks are too full for an anticompaction,
you can't move the token on that node.

Given this, I wonder about the right approach to capacity planning.  If I
want to store, say, 500M rows, and I know based on current cfstats that the
mean compacted size row is 27k, how much overhead is there on top of the
13.5 TB of raw data?

Trying to compute from what I have, in cfstats I have a total "Spaced used
(total)" of around 1.6TB (this is only a subset of the data loaded so far),
but when I could data directories using du(1) I get around 23TB already

On Wed, Jun 2, 2010 at 11:29 AM, Ian Soboroff <> wrote:

> Ok, answered part of this myself.  You can stop a node, move files around
> on the data disks, as long as they stay in the right keyspace directories,
> and all is fine.
> Now, I have a single Data.db file which is 900GB and is compacted.  The
> drive its on is only 1.5TB, so it can't anticompact at all.  Is there
> anything I can do?  The replication factor is 3, so one idea is to take down
> the node, blow away the huge file, adjust the token, and restart the node.
> At that point I'm not sure what to tell the new node or other nodes to do...
> do I need to run a repair, or a cleanup, or a loadbalance, or ... what?
> It would be great to be able to fix a storage quota on a per-data-directory
> basis, to ensure that enough capacity is retained for anticompaction.
> Default 45% quota, adjustable for the brave.
> Ian
> On Tue, Jun 1, 2010 at 4:08 PM, Ian Soboroff <> wrote:
>> My nodes have 5 disks and are using them separately as data disks.  The
>> usage on the disks is not uniform, and one is nearly full.  Is there some
>> way to manually balance the files across the disks?  Pretty much anything
>> done via nodetool incurs an anticompaction with obviously fails.  system/ is
>> not the problem, it's in my data's keyspace.
>> Ian

View raw message