incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rustam Aliyev <>
Subject Re: Cassandra and disk space
Date Thu, 09 Dec 2010 23:52:19 GMT

> That depends on your scenario.  In the worst case of one big CF, 
> there's not much that can be easily done for the disk usage of 
> compaction and cleanup (which is essentially compaction).
> If, instead, you have several column families and no single CF makes 
> up the majority of your data, you can push your disk usage a bit higher.

Is there any formula to calculate this? Let's say I have 500GB in single 
CF. So I need at least 500GB of free space for compaction. If I 
partition this CF and split it into 10 proportional CFs each 50GB, does 
it mean that I will need only 50GB of free space?

Also, is there recommended maximum of data size per node?


> A fundamental idea behind Cassandra's architecture is that disk space 
> is cheap (which, indeed, it is).  If you are particularly sensitive to 
> this, Cassandra might not be the best solution to your problem.  Also 
> keep in mind that Cassandra performs well with average disks, so you 
> don't need to spend a lot there.  Additionally, most people find that 
> the replication protects their data enough to allow them to use RAID 0 
> instead of 1, 10, 5, or 6.
> - Tyler
> On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev < 
> <>> wrote:
>     Is there any plans to improve this in future?
>     For big data clusters this could be very expensive. Based on your
>     comment, I will need 200TB of storage for 100TB of data to keep
>     Cassandra running.
>     --
>     Rustam.
>     On 09/12/2010 17:56, Tyler Hobbs wrote:
>>     If you are on 0.6, repair is particularly dangerous with respect
>>     to disk space usage.  If your replica is sufficiently out of
>>     sync, you can triple your disk usage pretty easily.  This has
>>     been improved in 0.7, so repairs should use about half as much
>>     disk space, on average.
>>     In general, yes, keep your nodes under 50% disk usage at all
>>     times.  Any of: compaction, cleanup, snapshotting, repair, or
>>     bootstrapping (the latter two are improved in 0.7) can double
>>     your disk usage temporarily.
>>     You should plan to add more disk space or add nodes when you get
>>     close to this limit.  Once you go over 50%, it's more difficult
>>     to add nodes, at least in 0.6.
>>     - Tyler
>>     On Thu, Dec 9, 2010 at 11:19 AM, Mark <
>>     <>> wrote:
>>         I recently ran into a problem during a repair operation where
>>         my nodes completely ran out of space and my whole cluster
>>         was... well, clusterfucked.
>>         I want to make sure how to prevent this problem in the future.
>>         Should I make sure that at all times every node is under 50%
>>         of its disk space? Are there any normal day-to-day operations
>>         that would cause the any one node to double in size that I
>>         should be aware of? If on or more nodes to surpass the 50%
>>         mark, what should I plan to do?
>>         Thanks for any advice

View raw message