Hi †Robert,

We found having about 50% free disk space is a good rule of thumb. Cassandra will typically use less than that when running compactions, however it is good to have free space available just in case it compacts some of the larger SSTables in the keyspace. More information can be found on the Datastax website [1]

If you have a situation where only one node in the cluster is running low on disk space and all other nodes are fine for disk space, there are two things you can do.
1) Run a 'nodetool repair -pr' on each node to ensure that the token ranges for each node are balanced (this should be run periodically anyway).
2) Run targeted compactions on the problem node using 'nodetool compact [keyspace] [table]', where [table] is the list of the SSTables tables on the node that need to be reduced in size.

Note that having a single node that uses all its disk space while the other nodes are fine implies that there could be underlying issues with the node.


[1] http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/architecture/architecturePlanningDiskCapacity_t.html

On Fri, Nov 29, 2013 at 10:48 PM, Sankalp Kohli <kohlisankalp@gmail.com> wrote:
Apart from the compaction, you might want to also look at free space required for repairs.
This could be problem if you have large rows as repair is not at column level.

> On Nov 28, 2013, at 19:21, Robert Wille <rwille@fold3.com> wrote:
> Iím trying to estimate our disk space requirements and Iím wondering about disk space required for compaction.
> My application mostly inserts new data and performs updates to existing data very infrequently, so there will be very few bytes removed by compaction. It seems that if a major compaction occurs, that performing the compaction will require as much disk space as is currently consumed by the table.
> So hereís my question. If Cassandra only compacts one table at a time, then I should be safe if I keep as much free space as there is data in the largest table. If Cassandra can compact multiple tables simultaneously, then it seems that I need as much free space as all the tables put together, which means no more than 50% utilization. So, how much free space do I need? Any rules of thumb anyone can offer?
> Also, what happens if a node gets low on disk space and there isnít enough available for compaction? If I add new nodes to reduce the amount of data on each node, I assume the space wonít be reclaimed until a compaction event occurs. Is there a way to salvage a node that gets into a state where it cannot compact its tables?
> Thanks
> Robert