We are running a 3 node Cassandra 1.1.5 cluster with a 3TB Raid 0 disk per node for the data dir and separate disk for the commitlog, 12 cores, 24 GB RAM (12GB to Cassandra heap).
We now have 1.1 TB worth of data per node (RF = 2).
Our data input is between 20 to 30 GB per day, depending on operating conditions of the data sources.
Problem is we have to start deleting data because we will hit the capacity.
From reading around we see we have 2 options:
1. Start issuing regular major compactions (nodetool compact).
- This is not recommended:
- Stops minor compactions.
- Major performance hit on node (very bad for us because need to be taking data all the time).
2. Switch to Leveled compaction strategy.
- It is mentioned to help with deletes and disk space usage. Can someone confirm?
Can anyone suggest which (if any) is better? Are there better solutions?
- Our usage pattern is write once, read once (export) and delete once! Basically we are using Cassandra as a data buffer between our collection points and a long term back-up system (it should provide a time window e.g. 1 month of data before data gets deleted from the cluster).
- Due to financial and space constraints it is very unlikely we can add more nodes to the cluster.
- We were thinking of relying on the automatic minor compactions to free up space for us but as the Size-Tiered compaction strategy seems to work, we will hit the capacity before we manage to free up disk space (this is very strange because no matter how much disk space you have per node data files will get larger and larger and you will eventually hit the same problem of minor compactions not freeing space fast enough - Can someone confirm?)