cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josep Blanquer <>
Subject Live migrating data from 2 separate cassandra clusters
Date Thu, 25 Aug 2011 22:25:23 GMT

 I am looking for an efficient way migrate a portion of the data existing in
a Cassandra cluster to another, separate Cassandra cluster. What I need is
to solve the typical live migration problem that appears in any "DB
sharding" where need to transfer "ownership" of certain rows from DB1 to
DB2...but in a way that clients see no (or almost no) disruption when you
actually do the cutover to DB2 for those writes.

I mean doing something as typical like:

loop (until almost no rows have been modified):
 rows = SELECT * from T where "criteria matches (i.e., shard_id=1) " AND
updated_at > last_time
 last_time = now
 insert(rows) elsewhere
"lock" modifications to original DB
do one last SELECT to get the last few modified rows
cutover the ownership - (change and ensure the clients know that the new
home for that data is in the other "DB")
unlock modifications

 So, anyway, I thought that I'd be able to apply the same principles by
passing a timestamp of sorts to the get_slices call so I could further
restrict getting only matching columns that have timestamps newer than the
one passed. Now, looking at the thrift interface I see that there is no
timestamp parameter at all...which makes me wonder how people are doing it,
and if there are any well-know practices for it. Setting up a full new
replicating DC within the same cluster doesn't work, as there are some clear
cases where you want to have completely separate cassandra rings.


 Josep M.

View raw message