incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Migrating all rows from 0.6.13 to 0.7.5 over thrift?
Date Fri, 06 May 2011 00:10:43 GMT
The difficulty is the different thrift clients between 0.6 and 0.7.

If you want to roll your own solution I would consider:
- write an app to talk to 0.6 and pull out the data using keys from the other system (so you
know can check referential integrity while you are at it). Dump the data to flat file. 
- write an app to talk to 0.7 to load the data back in. 

I've not given up digging on your migration problem, having to manually dump and reload if
you've done nothing wrong is not the best solution. I'll try to find some time this weekend
to test with:

- 0.6 server, random paritioner, standard CF's, byte column
- load with python or the cli on osx or ubuntu (dont have a window machine any more) 
- migrate and see whats going on. 

If you can spare some sample data to load please send it over in the user group or my email
address. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6 May 2011, at 05:52, Henrik Schröder wrote:

> We can't do a straight upgrade from 0.6.13 to 0.7.5 because we have rows stored that
have unicode keys, and Cassandra 0.7.5 thinks those rows in the sstables are corrupt, and
it seems impossible to clean it up without losing data.
> 
> However, we can still read all rows perfectly via thrift so we are now looking at building
a simple tool that will copy all rows from our 0.6.3 cluster to a parallell 0.7.5 cluster.
Our question is now how to do that and ensure that we actually get all rows migrated? It's
a pretty small cluster, 3 machines, a single keyspace, a singke columnfamily, ~2 million rows,
a few GB of data, and a replication factor of 3.
> 
> So what's the best way? Call get_range_slices and move through the entire token space?
We also have all row keys in a secondary system, would it be better to use that and make calls
to get_multi or get_multi_slices instead? Are we correct in assuming that if we use the consistencylevel
ALL we'll get all rows?
> 
> 
> /Henrik Schröder


Mime
View raw message