Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of skrolle@gmail.com designates
 209.85.213.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=RV+om9JmNNYc80oXdJBpNJ2MCgCjx7gxAiumC7bprBdhRbiYT0XiUa4oWT2Qgwqt/L
         icFWbbl/KzA5lhm7vh+lZL2QL58jR0FA38FfbBTN78WO72sMgQDBE7v9/7tTAFNQbEB2
         S1fKkEPkX4a92OW21QmvaYb5xH2+bl+Fo/KCk=
MIME-Version: 1.0
In-Reply-To: <ADF7C794-B283-4CF8-85AC-A17BC39E1088@thelastpickle.com>
References: <BANLkTi=ENiasLZss5vWzSedEbSxbFWr=TQ@mail.gmail.com>
	<ADF7C794-B283-4CF8-85AC-A17BC39E1088@thelastpickle.com>
Date: Fri, 6 May 2011 12:04:00 +0200
Message-ID: <BANLkTim4cRL5ULdoBpX+mzi2e9j-8pay0w@mail.gmail.com>
Subject: Re: Migrating all rows from 0.6.13 to 0.7.5 over thrift?
From: =?ISO-8859-1?Q?Henrik_Schr=F6der?= <skrolle@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0016e64095c2d750c704a2989b36

--0016e64095c2d750c704a2989b36
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I'll see if I can make some example broken files this weekend.


/Henrik Schr=F6der

On Fri, May 6, 2011 at 02:10, aaron morton <aaron@thelastpickle.com> wrote:

> The difficulty is the different thrift clients between 0.6 and 0.7.
>
> If you want to roll your own solution I would consider:
> - write an app to talk to 0.6 and pull out the data using keys from the
> other system (so you know can check referential integrity while you are a=
t
> it). Dump the data to flat file.
> - write an app to talk to 0.7 to load the data back in.
>
> I've not given up digging on your migration problem, having to manually
> dump and reload if you've done nothing wrong is not the best solution. I'=
ll
> try to find some time this weekend to test with:
>
> - 0.6 server, random paritioner, standard CF's, byte column
> - load with python or the cli on osx or ubuntu (dont have a window machin=
e
> any more)
> - migrate and see whats going on.
>
> If you can spare some sample data to load please send it over in the user
> group or my email address.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6 May 2011, at 05:52, Henrik Schr=F6der wrote:
>
> > We can't do a straight upgrade from 0.6.13 to 0.7.5 because we have row=
s
> stored that have unicode keys, and Cassandra 0.7.5 thinks those rows in t=
he
> sstables are corrupt, and it seems impossible to clean it up without losi=
ng
> data.
> >
> > However, we can still read all rows perfectly via thrift so we are now
> looking at building a simple tool that will copy all rows from our 0.6.3
> cluster to a parallell 0.7.5 cluster. Our question is now how to do that =
and
> ensure that we actually get all rows migrated? It's a pretty small cluste=
r,
> 3 machines, a single keyspace, a singke columnfamily, ~2 million rows, a =
few
> GB of data, and a replication factor of 3.
> >
> > So what's the best way? Call get_range_slices and move through the enti=
re
> token space? We also have all row keys in a secondary system, would it be
> better to use that and make calls to get_multi or get_multi_slices instea=
d?
> Are we correct in assuming that if we use the consistencylevel ALL we'll =
get
> all rows?
> >
> >
> > /Henrik Schr=F6der
>
>

--0016e64095c2d750c704a2989b36
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I&#39;ll see if I can make some example broken files this weekend.<br><br><=
br>/Henrik Schr=F6der<br><br><div class=3D"gmail_quote">On Fri, May 6, 2011=
 at 02:10, aaron morton <span dir=3D"ltr">&lt;<a href=3D"mailto:aaron@thela=
stpickle.com">aaron@thelastpickle.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">The difficulty is the different thrift clie=
nts between 0.6 and 0.7.<br>
<br>
If you want to roll your own solution I would consider:<br>
- write an app to talk to 0.6 and pull out the data using keys from the oth=
er system (so you know can check referential integrity while you are at it)=
. Dump the data to flat file.<br>
- write an app to talk to 0.7 to load the data back in.<br>
<br>
I&#39;ve not given up digging on your migration problem, having to manually=
 dump and reload if you&#39;ve done nothing wrong is not the best solution.=
 I&#39;ll try to find some time this weekend to test with:<br>
<br>
- 0.6 server, random paritioner, standard CF&#39;s, byte column<br>
- load with python or the cli on osx or ubuntu (dont have a window machine =
any more)<br>
- migrate and see whats going on.<br>
<br>
If you can spare some sample data to load please send it over in the user g=
roup or my email address.<br>
<br>
Cheers<br>
<br>
-----------------<br>
<font color=3D"#888888">Aaron Morton<br>
Freelance Cassandra Developer<br>
@aaronmorton<br>
<a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://www.thela=
stpickle.com</a><br>
</font><div><div></div><div class=3D"h5"><br>
On 6 May 2011, at 05:52, Henrik Schr=F6der wrote:<br>
<br>
&gt; We can&#39;t do a straight upgrade from 0.6.13 to 0.7.5 because we hav=
e rows stored that have unicode keys, and Cassandra 0.7.5 thinks those rows=
 in the sstables are corrupt, and it seems impossible to clean it up withou=
t losing data.<br>

&gt;<br>
&gt; However, we can still read all rows perfectly via thrift so we are now=
 looking at building a simple tool that will copy all rows from our 0.6.3 c=
luster to a parallell 0.7.5 cluster. Our question is now how to do that and=
 ensure that we actually get all rows migrated? It&#39;s a pretty small clu=
ster, 3 machines, a single keyspace, a singke columnfamily, ~2 million rows=
, a few GB of data, and a replication factor of 3.<br>

&gt;<br>
&gt; So what&#39;s the best way? Call get_range_slices and move through the=
 entire token space? We also have all row keys in a secondary system, would=
 it be better to use that and make calls to get_multi or get_multi_slices i=
nstead? Are we correct in assuming that if we use the consistencylevel ALL =
we&#39;ll get all rows?<br>

&gt;<br>
&gt;<br>
&gt; /Henrik Schr=F6der<br>
<br>
</div></div></blockquote></div><br>

--0016e64095c2d750c704a2989b36--