cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jaalex.tech" <jaalex.t...@gmail.com>
Subject Using TTL for data purge
Date Tue, 22 Dec 2015 09:35:51 GMT
Hi,

I'm looking for suggestions/caveats on using TTL as a subsitute for a
manual data purge job.

We have few tables that hold user information - this could be guest or
registered users, and there could be between 500K to 1M records created per
day per table. Currently, these tables have a secondary indexed
updated_date column which is populated on each update. However, we have
been getting timeouts when running queries using updated_date when the
number of records are high, so i don't think this would be a reliable
option in the long term when we need to purge records that have not been
used for the last X days.

In this scenario, is it advisable to include a high enough TTL (i.e the
amount of time we want these to last, could be 3 to 6 months) when
inserting/updating records?

There could be cases where the TTL may get reset after couple of
days/weeks, when the user visits the site again.

The tables have fixed number of columns, except for one which has a
clustering key, and may have max 10 entries per  partition key.

I need to know the overhead of having so many rows with TTL hanging around
for a relatively longer duration (weeks/months), and the impacts it could
have on performance/storage. If this is not a recommended approach, what
would be an alternate design which could be used for a manual purge job,
without using secondary indices.

We are using Cassandra 2.0.x.

Thanks,
Joseph

Mime
View raw message