cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weijun Li" <weiju...@gmail.com>
Subject RE: Strategy to delete/expire keys in cassandra
Date Thu, 25 Feb 2010 06:56:50 GMT
Hi Sylvain, I just noticed that you are the one that implemented the
Expiring Column feature. Could you please help on my questions?

 

Should I just run command (in Cassandra 0.5 source folder?) like: 

 

patch -p1 -i  0001-Add-new-ExpiringColumn-class.patch

 

for all of the five patches in your ticket?

 

Also what's your opinion on extending ExpiringColumn to expire a key
completely? Otherwise it will be difficult to track what are expired or old
rows in Cassandra.

 

Thanks,

-Weijun

 

From: Weijun Li [mailto:weijunli@gmail.com] 
Sent: Tuesday, February 23, 2010 6:18 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Strategy to delete/expire keys in cassandra

 

Thanks for the answer.  A dumb question: how did you apply the patch file to
0.5 source? The link you gave doesn't mention that the patch is for 0.5??

Also, this ExpiringColumn feature doesn't seem to expire key/row, meaning
the number of keys will keep grow (even if you drop columns for them) unless
you delete them. In your case, how do you manage deleting/expiring keys from
Cassandra? Do you keep a list of keys somewhere and go through them once a
while?

Thanks,

-Weijun

On Tue, Feb 23, 2010 at 2:26 AM, Sylvain Lebresne <sylvain@yakaz.com> wrote:

Hi,

Maybe the following ticket/patch may be what you are looking for:
https://issues.apache.org/jira/browse/CASSANDRA-699

It's flagged for 0.7 but as it breaks the API (and if I understand correctly
the release plan) it may not make it in cassandra before 0.8 (and the
patch will have to change to accommodate the change that will be
made to the internals in 0.7).

Anyway, what I can at least tell you is that I'm using the patch against
0.5 in a test cluster without problem so far.


> 3)      Once keys are deleted, do you have to wait till next GC to clean
> them from disk or memory (suppose you don't run cleanup manually)? What's
> the strategy for Cassandra to handle deleted items (notify other replica
> nodes, cleanup memory/disk, defrag/rebuild disk files, rebuild bloom
filter
> etc). I'm asking this because if the keys refresh very fast (i.e., high
> volume write/read and expiration is kind of short) how will the data file
> grow and how does this impact the system performance.

Items are deleted only during compaction, and you may actually have to
wait for the GCGraceSeconds before deletion. This value is configurable in
storage-conf.xml, but is 10 days by default. You can decrease this value
but because of consistency (and the fact that you have to at least wait for
compaction to occurs) you will always have a delay before the actual delete
(all this is also true for the patch I mention above by the way). But when
it's
deleted, it's just skipping the items during compaction, so it's really
cheap.

--
Sylvain

 


Mime
View raw message