cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Haarmann <>
Subject multiple tables vs. partitions and TTL
Date Thu, 01 Feb 2018 08:16:20 GMT
Hi experts, 

I have a design issue here: 
We want to store bigger amounts of data (> 30mio rows containing blobs) which will be deleted
depending on the type 
of data on a monthly base (not in the same order as the data entered the system). 
Some data would survive for two month only, other data for 3-5 years. 

The choice now is to have one table only with TTL per partition and partitions per deletion
month (when the data should be deleted) 
which will allow a single delete command, followed by a compaction 
or alternatively to have multiple tables (one per month when the deletion process would just
drop the table). 
The logic to retrieve that data is per record, so we know both the retention period and the
id (uuid) of the addressed record, 
so multiple tables can be handled. 

Since it would be one table per deletion month, I do not expect more than 1000-2000 tables,
depending on the 
retention period of the data. 

The benefit creating multiple tables would be that there are no tombstones while more tables
take more memory in the nodes. 
The one table approach would make the compaction process take longer and produce more I/O
activity because 
the compaction would regenerate multiple tables internally. 

Any thoughts on this ? 
We want to use 9 nodes, cassandra 3.11 on Linux, total data amount expected ~15-20 TB. 

Thank you very much, 

Marcus Haarmann 

View raw message