cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weijun Li" <weiju...@gmail.com>
Subject Strategy to delete/expire keys in cassandra
Date Tue, 23 Feb 2010 08:35:58 GMT
It seems that we are mostly talking about write and read keys into/from Cassandra cluster.
I’m wondering how did you successfully deal with deleting/expiring keys in Cassandra? An
typical example is you want to delete keys that haven’t been modified in certain time period
(i.e., old keys). Here’s my thoughts:

 

1)      If you use order preserve partition, you need to iterate through all keys, periodically,
to check their last modified time to decide whether a key should be deleted. When you have
hundreds million of keys with high write/read traffic, it will be very time and resource consuming
to iterate all keys in all clusters.

2)      If you use random partition, you’ll need to keep a list of ALL keys somewhere and
keep it updated through the time, then go through it periodically to delete expired items.
Again when you have hundreds million of keys, maintaining such a big dynamic key list with
their expiration time is not trivial work.

3)      Once keys are deleted, do you have to wait till next GC to clean them from disk or
memory (suppose you don’t run cleanup manually)? What’s the strategy for Cassandra to
handle deleted items (notify other replica nodes, cleanup memory/disk, defrag/rebuild disk
files, rebuild bloom filter etc). I’m asking this because if the keys refresh very fast
(i.e., high volume write/read and expiration is kind of short) how will the data file grow
and how does this impact the system performance. 

 

So what’s your opinion to deal with the above cases to expire keys? I’m trying to decide
whether we can use Cassandra for just high traffic read-only, write-only or both read and
write.

 

Thanks,

 

-Weijun


Mime
View raw message