incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Kjellman <>
Subject Re: Partition maintenance
Date Tue, 18 Dec 2012 17:37:00 GMT
Yeah. No JOINs as of now in Cassandra.

What if you dumped the CF in question once a month to json and rewrote out each record in
the json data if it met the time stamp you were interested in archiving.

You could then bulk load each "month" back in if you had to restore.

Doesn't help with deletes though and I would advise against large mass delete operations each
month -- tends to lead to a very unhappy cluster

On Dec 18, 2012, at 9:23 AM, "<>"

Michael - That is one approach I have considered, but that also makes querying the system
particularly onerous since every column family would require its own query – I don’t think
there is any good way to “join” those, right?

Chris – that is an interesting concept, but as Viktor and Keith note, it seems to have problems.

Could we do this simply by mass deletes?  For example, if I created a column which was just
YYYY/MM, then during our maintenance we could spool off records that match the month we are
archiving, then do a bulk delete by that key.  We would need to have a secondary index for
that, I would assume.

From: Michael Kjellman []
Sent: Tuesday, December 18, 2012 11:15 AM
Subject: Re: Partition maintenance

You could make a column family for each period of time and then drop the column family when
you want to destroy it. Before you drop it you could use the sstabletojson converter and write
the json files out to tape.

Might make your life difficult however if you need an input split for map reduce between each
time period because you would be limited to working on one column family at a time.

On Dec 18, 2012, at 8:09 AM, "<>"
Hi folks.  Still working through the details of building out a Cassandra solution and I have
an interesting requirement that I’m not sure how to implement in Cassandra:

In our current Oracle world, we have the data for this system partitioned by month, and each
month the data that are now 18-months old are archived to tape/cold storage and then the partition
for that month is dropped.  Is there a way to do something similar with Cassandra without
destroying our overall performance?

Thanks in advance,

Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit:

Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit:

View raw message