incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject RE: Partition maintenance
Date Tue, 18 Dec 2012 17:23:12 GMT
Michael - That is one approach I have considered, but that also makes querying the system particularly
onerous since every column family would require its own query – I don’t think there is
any good way to “join” those, right?

Chris – that is an interesting concept, but as Viktor and Keith note, it seems to have problems.

Could we do this simply by mass deletes?  For example, if I created a column which was just
YYYY/MM, then during our maintenance we could spool off records that match the month we are
archiving, then do a bulk delete by that key.  We would need to have a secondary index for
that, I would assume.

From: Michael Kjellman []
Sent: Tuesday, December 18, 2012 11:15 AM
Subject: Re: Partition maintenance

You could make a column family for each period of time and then drop the column family when
you want to destroy it. Before you drop it you could use the sstabletojson converter and write
the json files out to tape.

Might make your life difficult however if you need an input split for map reduce between each
time period because you would be limited to working on one column family at a time.

On Dec 18, 2012, at 8:09 AM, "<>"
Hi folks.  Still working through the details of building out a Cassandra solution and I have
an interesting requirement that I’m not sure how to implement in Cassandra:

In our current Oracle world, we have the data for this system partitioned by month, and each
month the data that are now 18-months old are archived to tape/cold storage and then the partition
for that month is dropped.  Is there a way to do something similar with Cassandra without
destroying our overall performance?

Thanks in advance,

Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit:
View raw message