Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 12217376F for ; Fri, 6 May 2011 10:04:30 +0000 (UTC) Received: (qmail 23747 invoked by uid 500); 6 May 2011 10:04:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 23721 invoked by uid 500); 6 May 2011 10:04:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23712 invoked by uid 99); 6 May 2011 10:04:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2011 10:04:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of skrolle@gmail.com designates 209.85.213.44 as permitted sender) Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2011 10:04:21 +0000 Received: by ywi6 with SMTP id 6so1344781ywi.31 for ; Fri, 06 May 2011 03:04:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=lPeqvp3eFPBvSykopb1xPyGg/wA4Cr2ms/mWadUeR1I=; b=NYII7dUxN4SfpaUXeKfVVUOUJCuRGm+V2I6/Mq1PZth+9ND5RG71SSccSx1IiRd5Cj Lsq6TAkUdoFB+BrqZPdREaxo0iQrymx9r8iFr/w4hW6YEIYBHtyZSTWRXb+ISU7WfALB ZrqDaZvLyoE9UxC/Y02MWk0yFHaAGbZOGnEYQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=RV+om9JmNNYc80oXdJBpNJ2MCgCjx7gxAiumC7bprBdhRbiYT0XiUa4oWT2Qgwqt/L icFWbbl/KzA5lhm7vh+lZL2QL58jR0FA38FfbBTN78WO72sMgQDBE7v9/7tTAFNQbEB2 S1fKkEPkX4a92OW21QmvaYb5xH2+bl+Fo/KCk= MIME-Version: 1.0 Received: by 10.91.149.5 with SMTP id b5mr3160334ago.91.1304676240149; Fri, 06 May 2011 03:04:00 -0700 (PDT) Received: by 10.90.55.2 with HTTP; Fri, 6 May 2011 03:04:00 -0700 (PDT) In-Reply-To: References: Date: Fri, 6 May 2011 12:04:00 +0200 Message-ID: Subject: Re: Migrating all rows from 0.6.13 to 0.7.5 over thrift? From: =?ISO-8859-1?Q?Henrik_Schr=F6der?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e64095c2d750c704a2989b36 X-Virus-Checked: Checked by ClamAV on apache.org --0016e64095c2d750c704a2989b36 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'll see if I can make some example broken files this weekend. /Henrik Schr=F6der On Fri, May 6, 2011 at 02:10, aaron morton wrote: > The difficulty is the different thrift clients between 0.6 and 0.7. > > If you want to roll your own solution I would consider: > - write an app to talk to 0.6 and pull out the data using keys from the > other system (so you know can check referential integrity while you are a= t > it). Dump the data to flat file. > - write an app to talk to 0.7 to load the data back in. > > I've not given up digging on your migration problem, having to manually > dump and reload if you've done nothing wrong is not the best solution. I'= ll > try to find some time this weekend to test with: > > - 0.6 server, random paritioner, standard CF's, byte column > - load with python or the cli on osx or ubuntu (dont have a window machin= e > any more) > - migrate and see whats going on. > > If you can spare some sample data to load please send it over in the user > group or my email address. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 6 May 2011, at 05:52, Henrik Schr=F6der wrote: > > > We can't do a straight upgrade from 0.6.13 to 0.7.5 because we have row= s > stored that have unicode keys, and Cassandra 0.7.5 thinks those rows in t= he > sstables are corrupt, and it seems impossible to clean it up without losi= ng > data. > > > > However, we can still read all rows perfectly via thrift so we are now > looking at building a simple tool that will copy all rows from our 0.6.3 > cluster to a parallell 0.7.5 cluster. Our question is now how to do that = and > ensure that we actually get all rows migrated? It's a pretty small cluste= r, > 3 machines, a single keyspace, a singke columnfamily, ~2 million rows, a = few > GB of data, and a replication factor of 3. > > > > So what's the best way? Call get_range_slices and move through the enti= re > token space? We also have all row keys in a secondary system, would it be > better to use that and make calls to get_multi or get_multi_slices instea= d? > Are we correct in assuming that if we use the consistencylevel ALL we'll = get > all rows? > > > > > > /Henrik Schr=F6der > > --0016e64095c2d750c704a2989b36 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'll see if I can make some example broken files this weekend.

<= br>/Henrik Schr=F6der

On Fri, May 6, 2011= at 02:10, aaron morton <aaron@thelastpickle.com> wrote:
The difficulty is the different thrift clie= nts between 0.6 and 0.7.

If you want to roll your own solution I would consider:
- write an app to talk to 0.6 and pull out the data using keys from the oth= er system (so you know can check referential integrity while you are at it)= . Dump the data to flat file.
- write an app to talk to 0.7 to load the data back in.

I've not given up digging on your migration problem, having to manually= dump and reload if you've done nothing wrong is not the best solution.= I'll try to find some time this weekend to test with:

- 0.6 server, random paritioner, standard CF's, byte column
- load with python or the cli on osx or ubuntu (dont have a window machine = any more)
- migrate and see whats going on.

If you can spare some sample data to load please send it over in the user g= roup or my email address.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thela= stpickle.com

On 6 May 2011, at 05:52, Henrik Schr=F6der wrote:

> We can't do a straight upgrade from 0.6.13 to 0.7.5 because we hav= e rows stored that have unicode keys, and Cassandra 0.7.5 thinks those rows= in the sstables are corrupt, and it seems impossible to clean it up withou= t losing data.
>
> However, we can still read all rows perfectly via thrift so we are now= looking at building a simple tool that will copy all rows from our 0.6.3 c= luster to a parallell 0.7.5 cluster. Our question is now how to do that and= ensure that we actually get all rows migrated? It's a pretty small clu= ster, 3 machines, a single keyspace, a singke columnfamily, ~2 million rows= , a few GB of data, and a replication factor of 3.
>
> So what's the best way? Call get_range_slices and move through the= entire token space? We also have all row keys in a secondary system, would= it be better to use that and make calls to get_multi or get_multi_slices i= nstead? Are we correct in assuming that if we use the consistencylevel ALL = we'll get all rows?
>
>
> /Henrik Schr=F6der


--0016e64095c2d750c704a2989b36--