I believe you are talking about "HDD space", consumed by user generated data which is no longer required after 15 days or may required.
First case to use TTL which you don't wan to use. 2nd as aaron pointed snapshotting data, but data still exist in cluster, only used for back up.
I think of like using column family bucket, 15 day a bucket , 2 bucket a month.
Creating new cf every 15th day with time-stamp marker trip_offer_cf_[ts -ts%(86400*15)], caching cf name in app for 15 days, after 15th day old cf bucket will be read only, no write goes into it, snapshotting that old_cf_bucket _data, and deleting that cf few days later, this will keep cf count fixed.
current cf count=n,
bucket cf count= b*n
using separate cluster old data analytic.
We are keeping daily generated data(user generated content) in Cassandra, but our application is using only 15 days old data. So how can we archive data older than 15 days so that we can reduce load on Cassandra ring.
Note : we can’t apply TTL, as this data may be needed in future.
I'm not sure on your needs, but the simplest thing to consider is snapshotting and copying off node.
On 1/06/2012, at 12:23 AM, Shubham Srivastava wrote:
I need to archive my Cassandra data into another permanent storage .
1.To shed the unused data from the Live data.
2.To use the archived data for getting some analytics out or a potential source of DataWarehouse.
Any recommendations for the same in terms of strategies or tools to use.
Shubham Srivastava | Technical Lead - Technology Development
+91 124 4910 548 | MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, Gurgaon, Haryana - 122 016, India
<image001.gif>What's new? My Trip Rewards - An exclusive loyalty program for MakeMyTrip customers.