I am currently using about 10 CF to store temporal data. Those data are growing pretty big (hundreds of GB when I actually only need information from the last month - i.e. about hundreds of MB).
I am going to delete old (and useless) data, I cannot always use TTL since I have counters too. Yet I know that deletes are a bit tricky in Cassandra, due to the fact that they are distributed.
I was wondering about the best way to keep high performance and get rid of tombstones easily.
I was considering 2 ways to do it :
- Major compaction on these 10 CF to force them to always keep fresh data only and remove tombstones
- LCS to have more chance to get all parts of the row in one SSTable, allowing tombstones to be removed eventually.
What would be the better option (i.e. what would be the impact of both solutions) ?
Do you need more information about those CF to answer this question ?
Any insight is welcome, as always.