cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@yakaz.com>
Subject Re: Strategy to delete/expire keys in cassandra
Date Thu, 25 Feb 2010 10:22:46 GMT
Hi,

> Should I just run command (in Cassandra 0.5 source folder?) like:
> patch –p1 –i  0001-Add-new-ExpiringColumn-class.patch
> for all of the five patches in your ticket?

Well, actually I lied. The patches were made for a version a little after 0.5.
If you really want to try, I attach a version of those patches that (should)
work with 0.5 (There is only the 3 first patch, but the fourth one is for tests
so not necessary per se). Apply them with your patch command.
Still, to compile that you will have to regenerate the thrift java
interface (with
ant gen-thrift-java), but for that you will have to install the right
svn revision of
thrift (which is libthrift-r820831 for 0.5). And if you manage to make it work,
you will have to digg in cassandra.thrift as it make change to it.

In the end, remember that this is not an official patch yet and it *will not*
make it in Cassandra in its current form. All I can tell you is that I
need those
expiring columns for quite some of my usage and I will do what I can to make
this feature included if and when possible.

> Also what’s your opinion on extending ExpiringColumn to expire a key
> completely? Otherwise it will be difficult to track what are expired or old
> rows in Cassandra.

I'm not sure how to make full rows (or even full superColumns for that matter)
expire. What if you set a row to expire after some time and add new columns
before this expiration ? Should you update the expiration of the row ? Which is
to say that a row will expires when it's last column expire, which is
almost what
you get with expiring column.
The one thing you may want though is that when all the columns of a row expire
(or, to be precise, get physically deleted), the row itself is
deleted. Looking at the
code, I'm not convince this happen and I'm not sure why.

--
Sylvain

>
>
>
> From: Weijun Li [mailto:weijunli@gmail.com]
> Sent: Tuesday, February 23, 2010 6:18 PM
> To: cassandra-user@incubator.apache.org
> Subject: Re: Strategy to delete/expire keys in cassandra
>
>
>
> Thanks for the answer.  A dumb question: how did you apply the patch file to
> 0.5 source? The link you gave doesn't mention that the patch is for 0.5??
>
> Also, this ExpiringColumn feature doesn't seem to expire key/row, meaning
> the number of keys will keep grow (even if you drop columns for them) unless
> you delete them. In your case, how do you manage deleting/expiring keys from
> Cassandra? Do you keep a list of keys somewhere and go through them once a
> while?
>
> Thanks,
>
> -Weijun
>
> On Tue, Feb 23, 2010 at 2:26 AM, Sylvain Lebresne <sylvain@yakaz.com> wrote:
>
> Hi,
>
> Maybe the following ticket/patch may be what you are looking for:
> https://issues.apache.org/jira/browse/CASSANDRA-699
>
> It's flagged for 0.7 but as it breaks the API (and if I understand correctly
> the release plan) it may not make it in cassandra before 0.8 (and the
> patch will have to change to accommodate the change that will be
> made to the internals in 0.7).
>
> Anyway, what I can at least tell you is that I'm using the patch against
> 0.5 in a test cluster without problem so far.
>
>> 3)      Once keys are deleted, do you have to wait till next GC to clean
>> them from disk or memory (suppose you don’t run cleanup manually)? What’s
>> the strategy for Cassandra to handle deleted items (notify other replica
>> nodes, cleanup memory/disk, defrag/rebuild disk files, rebuild bloom
>> filter
>> etc). I’m asking this because if the keys refresh very fast (i.e., high
>> volume write/read and expiration is kind of short) how will the data file
>> grow and how does this impact the system performance.
>
> Items are deleted only during compaction, and you may actually have to
> wait for the GCGraceSeconds before deletion. This value is configurable in
> storage-conf.xml, but is 10 days by default. You can decrease this value
> but because of consistency (and the fact that you have to at least wait for
> compaction to occurs) you will always have a delay before the actual delete
> (all this is also true for the patch I mention above by the way). But when
> it's
> deleted, it's just skipping the items during compaction, so it's really
> cheap.
>
> --
> Sylvain
>
>

Mime
View raw message