cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Migrating all rows from 0.6.13 to 0.7.5 over thrift?
Date Sat, 07 May 2011 12:48:25 GMT
range_slices respects consistencylevel, but only single-row reads and
multiget do the *repair* part of RR.

On Sat, May 7, 2011 at 1:44 AM, aaron morton <aaron@thelastpickle.com> wrote:
> get_range_slices() does read repair if enabled (checked DoConsistencyChecksBoolean in
the config, it's on by default) so you should be getting good reads. If you want belt-and-braces
run nodetool repair first.
>
> Hope that helps.
>
>
> On 7 May 2011, at 11:46, Jeremy Hanna wrote:
>
>> Great!  I just wanted to make sure you were getting the information you needed.
>>
>> On May 6, 2011, at 6:42 PM, Henrik Schröder wrote:
>>
>>> Well, I already completed the migration program. Using get_range_slices I could
migrate a few thousand rows per second, which means that migrating all of our data would take
a few minutes, and we'll end up with pristine datafiles for the new cluster. Problem solved!
>>>
>>> I'll see if I can create datafiles in 0.6 that are uncleanable in 0.7 so that
you all can repeat this and hopefully fix it.
>>>
>>>
>>> /Henrik Schröder
>>>
>>> On Sat, May 7, 2011 at 00:35, Jeremy Hanna <jeremy.hanna1234@gmail.com>
wrote:
>>> If you're able, go into the #cassandra channel on freenode (IRC) and talk to
driftx or jbellis or aaron_morton about your problem.  It could be that you don't have to
do all of this based on a conversation there.
>>>
>>> On May 6, 2011, at 5:04 AM, Henrik Schröder wrote:
>>>
>>>> I'll see if I can make some example broken files this weekend.
>>>>
>>>>
>>>> /Henrik Schröder
>>>>
>>>> On Fri, May 6, 2011 at 02:10, aaron morton <aaron@thelastpickle.com>
wrote:
>>>> The difficulty is the different thrift clients between 0.6 and 0.7.
>>>>
>>>> If you want to roll your own solution I would consider:
>>>> - write an app to talk to 0.6 and pull out the data using keys from the other
system (so you know can check referential integrity while you are at it). Dump the data to
flat file.
>>>> - write an app to talk to 0.7 to load the data back in.
>>>>
>>>> I've not given up digging on your migration problem, having to manually dump
and reload if you've done nothing wrong is not the best solution. I'll try to find some time
this weekend to test with:
>>>>
>>>> - 0.6 server, random paritioner, standard CF's, byte column
>>>> - load with python or the cli on osx or ubuntu (dont have a window machine
any more)
>>>> - migrate and see whats going on.
>>>>
>>>> If you can spare some sample data to load please send it over in the user
group or my email address.
>>>>
>>>> Cheers
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 6 May 2011, at 05:52, Henrik Schröder wrote:
>>>>
>>>>> We can't do a straight upgrade from 0.6.13 to 0.7.5 because we have rows
stored that have unicode keys, and Cassandra 0.7.5 thinks those rows in the sstables are corrupt,
and it seems impossible to clean it up without losing data.
>>>>>
>>>>> However, we can still read all rows perfectly via thrift so we are now
looking at building a simple tool that will copy all rows from our 0.6.3 cluster to a parallell
0.7.5 cluster. Our question is now how to do that and ensure that we actually get all rows
migrated? It's a pretty small cluster, 3 machines, a single keyspace, a singke columnfamily,
~2 million rows, a few GB of data, and a replication factor of 3.
>>>>>
>>>>> So what's the best way? Call get_range_slices and move through the entire
token space? We also have all row keys in a secondary system, would it be better to use that
and make calls to get_multi or get_multi_slices instead? Are we correct in assuming that if
we use the consistencylevel ALL we'll get all rows?
>>>>>
>>>>>
>>>>> /Henrik Schröder
>>>>
>>>>
>>>
>>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message