incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Haddad <>
Subject Re: Cassandra data retention policy
Date Mon, 28 Apr 2014 20:16:27 GMT
He said below that he’d like to keep the old data, so that might rule out TTLs in any case.

You’ve got a few options that I can think of off the top of my head.  The easiest from a
management perspective is to use one table per month.  WhateverData042014 would be this months.
 It’s easy enough to back up sstables, you just copy them off somewhere.  You could compact
the previous month’s table at the beginning of the following month, and copy the stables
off for archiving, in s3 or something similar.

Depending on where you end up moving the data, it might be more trouble than it’s worth,
since you might need to come up with a backup plan, and now you’ll have 2 things to back
up instead of just 1.  Also restoring the data is more of a pain than just querying it.  

On Apr 28, 2014, at 12:57 PM, Donald Smith <> wrote:

> CQL lets you specify a default TTL per column family/table:  and default_time_to_live=86400
> From: Redmumba [] 
> Sent: Monday, April 28, 2014 12:51 PM
> To:
> Subject: Re: Cassandra data retention policy
> Have you looked into using a TTL?  You can set this per insert (unfortunately, it can't
be set per CF) and values will be tombstoned after that amount of time.  I.e.,
>     INSERT INTO .... VALUES ... TTL 15552000
> Keep in mind, after the values have expired, they will essentially become tombstones--so
you will still need to run clean-ups (probably daily) to clear up space.
> Does this help?
> One caveat is that this is difficult to apply to existing rows--i.e., you can't bulk-update
a bunch of rows with this data.  As such, another good suggestion is to simply have a secondary
index on a date field of some kind, and run a bulk remove (and subsequent clean-up) daily/weekly/whatever.
> On Mon, Apr 28, 2014 at 11:31 AM, Han Jia <> wrote:
> Hi guys,
> We have a processing system that just uses the data for the past six months in Cassandra.
Any suggestions on the best way to manage the old data in order to save disk space? We want
to keep it as backup but it will not be used unless we need to do recovery. Thanks in advance!
> -John

View raw message