cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeffrey Wang <jw...@palantir.com>
Subject RE: rolling window of data
Date Thu, 03 Feb 2011 23:03:26 GMT
Thanks for the response, but unfortunately a TTL is not enough for us. We would like to be
able to dynamically control the window in case there is an unusually large amount of data
or something so we don't run out of disk space.

One question I have in particular is: if I use the timestamp of my log entries (not necessarily
correlated at all with the timestamp of insert) as the timestamp on my mutations will Cassandra
do the right thing when I delete? We don't have any need for conflict resolution, so we are
currently just using the current time.

It seems like there is a possibility, depending on the implementation details of Cassandra,
that I could call a remove with a timestamp for which everything before that should get deleted.
Like I said before, this seems a bit hacky to me, but would it get the job done?

-Jeffrey

-----Original Message-----
From: scode@scode.org [mailto:scode@scode.org] On Behalf Of Peter Schuller
Sent: Thursday, February 03, 2011 8:48 AM
To: user@cassandra.apache.org
Subject: Re: rolling window of data

> The correct way to accomplish what you describe is the new (in 0.7)
> per-column TTL.  Simply set this to 60 * 60 * 24 * 90 (90 day's worth of
> seconds) and your columns will magically disappear after that length of
> time.

Although that assumes it's okay to loose data or that there is some
other method in place to prevent loss of it should the data not be
processed to whatever extent is required.

TTL:s would be a great way to efficiently achieve the windowing, but
it does remove the ability to explicitly control exactly when data is
removed (such as after certain batch processing of it has completed).

-- 
/ Peter Schuller
Mime
View raw message