He said below that heíd like to keep the old data, so that might rule out TTLs in any case.

Youíve got a few options that I can think of off the top of my head.  The easiest from a management perspective is to use one table per month.  WhateverData042014 would be this months.  Itís easy enough to back up sstables, you just copy them off somewhere.  You could compact the previous monthís table at the beginning of the following month, and copy the stables off for archiving, in s3 or something similar.

Depending on where you end up moving the data, it might be more trouble than itís worth, since you might need to come up with a backup plan, and now youíll have 2 things to back up instead of just 1.  Also restoring the data is more of a pain than just querying it.  

On Apr 28, 2014, at 12:57 PM, Donald Smith <Donald.Smith@audiencescience.com> wrote:

CQL lets you specify a default TTL per column family/table:  and default_time_to_live=86400 .
From: Redmumba [mailto:redmumba@gmail.com] 
Sent: Monday, April 28, 2014 12:51 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra data retention policy

Have you looked into using a TTL?  You can set this per insert (unfortunately, it can't be set per CF) and values will be tombstoned after that amount of time.  I.e.,

    INSERT INTO .... VALUES ... TTL 15552000

Keep in mind, after the values have expired, they will essentially become tombstones--so you will still need to run clean-ups (probably daily) to clear up space.

Does this help?

One caveat is that this is difficult to apply to existing rows--i.e., you can't bulk-update a bunch of rows with this data.  As such, another good suggestion is to simply have a secondary index on a date field of some kind, and run a bulk remove (and subsequent clean-up) daily/weekly/whatever.


On Mon, Apr 28, 2014 at 11:31 AM, Han Jia <johnidealist@gmail.com> wrote:
Hi guys,
We have a processing system that just uses the data for the past six months in Cassandra. Any suggestions on the best way to manage the old data in order to save disk space? We want to keep it as backup but it will not be used unless we need to do recovery. Thanks in advance!