incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henrik Schröder <skro...@gmail.com>
Subject Re: Migrating all rows from 0.6.13 to 0.7.5 over thrift?
Date Fri, 06 May 2011 10:04:00 GMT
I'll see if I can make some example broken files this weekend.


/Henrik Schröder

On Fri, May 6, 2011 at 02:10, aaron morton <aaron@thelastpickle.com> wrote:

> The difficulty is the different thrift clients between 0.6 and 0.7.
>
> If you want to roll your own solution I would consider:
> - write an app to talk to 0.6 and pull out the data using keys from the
> other system (so you know can check referential integrity while you are at
> it). Dump the data to flat file.
> - write an app to talk to 0.7 to load the data back in.
>
> I've not given up digging on your migration problem, having to manually
> dump and reload if you've done nothing wrong is not the best solution. I'll
> try to find some time this weekend to test with:
>
> - 0.6 server, random paritioner, standard CF's, byte column
> - load with python or the cli on osx or ubuntu (dont have a window machine
> any more)
> - migrate and see whats going on.
>
> If you can spare some sample data to load please send it over in the user
> group or my email address.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6 May 2011, at 05:52, Henrik Schröder wrote:
>
> > We can't do a straight upgrade from 0.6.13 to 0.7.5 because we have rows
> stored that have unicode keys, and Cassandra 0.7.5 thinks those rows in the
> sstables are corrupt, and it seems impossible to clean it up without losing
> data.
> >
> > However, we can still read all rows perfectly via thrift so we are now
> looking at building a simple tool that will copy all rows from our 0.6.3
> cluster to a parallell 0.7.5 cluster. Our question is now how to do that and
> ensure that we actually get all rows migrated? It's a pretty small cluster,
> 3 machines, a single keyspace, a singke columnfamily, ~2 million rows, a few
> GB of data, and a replication factor of 3.
> >
> > So what's the best way? Call get_range_slices and move through the entire
> token space? We also have all row keys in a secondary system, would it be
> better to use that and make calls to get_multi or get_multi_slices instead?
> Are we correct in assuming that if we use the consistencylevel ALL we'll get
> all rows?
> >
> >
> > /Henrik Schröder
>
>

Mime
View raw message