incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Motta <>
Subject Re: Recommended way of data migration
Date Sun, 08 Sep 2013 12:23:39 GMT
That's a good approach. You could also migrate in-place if you're confident
your migration algorithm is correct, but for more safety having another CF
is better.

If you have a huge volume of data to be migrated (millions of rows or
more), I'd suggest you to use Hadoop to perform these migrations (

If it's only a few rows, then you could do it programmatically via *
get_range_slices* using the language binding of your choice. Below are some
links on how to perform this on Hector or Pycassa:

* Hector:
* Pycassa:

I Agree with Edward that you should only delete the rows once you make sure
they were correctly migrated.

2013/9/7 Edward Capriolo <>

> I would do something like you are suggesting. I would not do the delete
> until all the rows are moved. Since writes in cassandra are idempotent you
> can even run the migration process multiple times without harm.
> On Sat, Sep 7, 2013 at 5:31 PM, Renat Gilfanov <> wrote:
>> Hello,
>> Let's say we have a simple CQL3 table
>> CREATE TABLE example (
>>     timestamp TIMESTAMP,
>>     data ASCII
>> );
>> And I need to mutate  (for example encrypt) column values in the "data"
>> column for all rows.
>> What's the recommended approach to perform such migration
>> programatically?
>> For me the general approach is:
>> 1. Create another column family
>> 2. extract a batch of records
>> 3. for each extracted record, perform mutation, insert it in the new cf
>> and delete from old one
>> 4. repeat until source cf not empty
>> Is it correct approach and if yes, how to implement some kind of paging
>> for the step 2?

Paulo Ricardo

European Master in Distributed Computing***
Royal Institute of Technology - KTH
*Instituto Superior T├ęcnico - IST*

View raw message