Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from
	:mime-version:content-type:subject:date:in-reply-to:to
	:references:message-id; q=dns; s=thelastpickle.com; b=RqhqlLXM42
	9a4x2YCWyaQNXZgSqSGHFD4/MUXcyxxK772riy3x8tZxHuUojNUxJ6Yhtwc3tNB6
	Z/TbIrA57mZHXXEkpQHNpxIHtPJjXyPPjSGL18aDT5hwCiJPTVMtZ7x6isqEsAJa
	Hd6bc75T+B1ZmZYcAqqDMuEAlJ/ErFEaE=
From: aaron morton <aaron@thelastpickle.com>
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: multipart/alternative; boundary=Apple-Mail-35-188388098
Subject: Re: Migrating all rows from 0.6.13 to 0.7.5 over thrift?
Date: Mon, 9 May 2011 15:51:42 +1200
In-Reply-To: <BANLkTim-F7K3pnM=ENNQPVHAx88Xuu34aw@mail.gmail.com>
To: user@cassandra.apache.org
References: <BANLkTi=ENiasLZss5vWzSedEbSxbFWr=TQ@mail.gmail.com>
 <ADF7C794-B283-4CF8-85AC-A17BC39E1088@thelastpickle.com>
 <BANLkTim4cRL5ULdoBpX+mzi2e9j-8pay0w@mail.gmail.com>
 <FA1F2EF4-93BE-484A-B1B4-9AD2B7F46820@gmail.com>
 <BANLkTik5ZjKBjguYR2zKnrBOFCqu8=bhDQ@mail.gmail.com>
 <967C4367-1F04-4365-8237-E54DB3C3451F@gmail.com>
 <EBD750C5-3E87-4BA8-A032-C1756FC4E174@thelastpickle.com>
 <BANLkTik7BSvqyjeWLm6C87hkOLsnSLvn=g@mail.gmail.com>
 <5E594C71-8FEF-425B-8BEE-B5DFE7BED9C7@thelastpickle.com>
 <BANLkTim-F7K3pnM=ENNQPVHAx88Xuu34aw@mail.gmail.com>
Message-Id: <E7A3A0A5-9926-436A-92BC-D56D97AA830A@thelastpickle.com>


--Apple-Mail-35-188388098
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Out of interest i've done some more digging. Not sure how much more I've =
contributed but here goes...

Ran this against an clean v 0.6.12 and it works (I expected it to fail =
on the first read)

    client =3D pycassa.connect()
    standard1 =3D pycassa.ColumnFamily(client, 'Keyspace1', 'Standard1')

    uni_str =3D u"=E6=95=B0=E6=99=82=E9=96=93"
    uni_str =3D uni_str.encode("utf-8")
   =20
    print "Insert row", uni_str
    print uni_str, standard1.insert(uni_str, {"bar" : "baz"})

    print "Read rows"
    print "???", standard1.get("???")
    print uni_str, standard1.get(uni_str)

Ran that against the current 0.6 head from the command line and it =
works. Run against the code running in intelli J and the code fails as =
expected. Code also fails as expected on 0.7.5

At one stage I grabbed the buffer created by fastbinary.encode_binary in =
the python generated batch_mutate_args.write() and it looked like the =
key was correctly utf-8 encoded (matching bytes to the previous utf-8 =
encoding of that string).

I've updated the git project =
https://github.com/amorton/cassandra-unicode-bug=20

Am going to leave it there unless there is interest to keep looking into =
it.=20
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 May 2011, at 13:31, Jonathan Ellis wrote:

> Right, that's sort of a half-repair: it will repair differences in
> replies it got, but it won't doublecheck md5s on the rest in the
> background. So if you're doing CL.ONE reads this is a no-op.
>=20
> On Sat, May 7, 2011 at 4:25 PM, aaron morton <aaron@thelastpickle.com> =
wrote:
>> I remembered something like that so had a look at =
RangeSliceResponseResolver.resolve()  in 0.6.12 and it looks like it =
schedules the repairs...
>>=20
>>            protected Row getReduced()
>>            {
>>                ColumnFamily resolved =3D =
ReadResponseResolver.resolveSuperset(versions);
>>                ReadResponseResolver.maybeScheduleRepairs(resolved, =
table, key, versions, versionSources);
>>                versions.clear();
>>                versionSources.clear();
>>                return new Row(key, resolved);
>>            }
>>=20
>>=20
>> Is that right?
>>=20
>>=20
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>=20
>> On 8 May 2011, at 00:48, Jonathan Ellis wrote:
>>=20
>>> range_slices respects consistencylevel, but only single-row reads =
and
>>> multiget do the *repair* part of RR.
>>>=20
>>> On Sat, May 7, 2011 at 1:44 AM, aaron morton =
<aaron@thelastpickle.com> wrote:
>>>> get_range_slices() does read repair if enabled (checked =
DoConsistencyChecksBoolean in the config, it's on by default) so you =
should be getting good reads. If you want belt-and-braces run nodetool =
repair first.
>>>>=20
>>>> Hope that helps.
>>>>=20
>>>>=20
>>>> On 7 May 2011, at 11:46, Jeremy Hanna wrote:
>>>>=20
>>>>> Great!  I just wanted to make sure you were getting the =
information you needed.
>>>>>=20
>>>>> On May 6, 2011, at 6:42 PM, Henrik Schr=C3=B6der wrote:
>>>>>=20
>>>>>> Well, I already completed the migration program. Using =
get_range_slices I could migrate a few thousand rows per second, which =
means that migrating all of our data would take a few minutes, and we'll =
end up with pristine datafiles for the new cluster. Problem solved!
>>>>>>=20
>>>>>> I'll see if I can create datafiles in 0.6 that are uncleanable in =
0.7 so that you all can repeat this and hopefully fix it.
>>>>>>=20
>>>>>>=20
>>>>>> /Henrik Schr=C3=B6der
>>>>>>=20
>>>>>> On Sat, May 7, 2011 at 00:35, Jeremy Hanna =
<jeremy.hanna1234@gmail.com> wrote:
>>>>>> If you're able, go into the #cassandra channel on freenode (IRC) =
and talk to driftx or jbellis or aaron_morton about your problem.  It =
could be that you don't have to do all of this based on a conversation =
there.
>>>>>>=20
>>>>>> On May 6, 2011, at 5:04 AM, Henrik Schr=C3=B6der wrote:
>>>>>>=20
>>>>>>> I'll see if I can make some example broken files this weekend.
>>>>>>>=20
>>>>>>>=20
>>>>>>> /Henrik Schr=C3=B6der
>>>>>>>=20
>>>>>>> On Fri, May 6, 2011 at 02:10, aaron morton =
<aaron@thelastpickle.com> wrote:
>>>>>>> The difficulty is the different thrift clients between 0.6 and =
0.7.
>>>>>>>=20
>>>>>>> If you want to roll your own solution I would consider:
>>>>>>> - write an app to talk to 0.6 and pull out the data using keys =
from the other system (so you know can check referential integrity while =
you are at it). Dump the data to flat file.
>>>>>>> - write an app to talk to 0.7 to load the data back in.
>>>>>>>=20
>>>>>>> I've not given up digging on your migration problem, having to =
manually dump and reload if you've done nothing wrong is not the best =
solution. I'll try to find some time this weekend to test with:
>>>>>>>=20
>>>>>>> - 0.6 server, random paritioner, standard CF's, byte column
>>>>>>> - load with python or the cli on osx or ubuntu (dont have a =
window machine any more)
>>>>>>> - migrate and see whats going on.
>>>>>>>=20
>>>>>>> If you can spare some sample data to load please send it over in =
the user group or my email address.
>>>>>>>=20
>>>>>>> Cheers
>>>>>>>=20
>>>>>>> -----------------
>>>>>>> Aaron Morton
>>>>>>> Freelance Cassandra Developer
>>>>>>> @aaronmorton
>>>>>>> http://www.thelastpickle.com
>>>>>>>=20
>>>>>>> On 6 May 2011, at 05:52, Henrik Schr=C3=B6der wrote:
>>>>>>>=20
>>>>>>>> We can't do a straight upgrade from 0.6.13 to 0.7.5 because we =
have rows stored that have unicode keys, and Cassandra 0.7.5 thinks =
those rows in the sstables are corrupt, and it seems impossible to clean =
it up without losing data.
>>>>>>>>=20
>>>>>>>> However, we can still read all rows perfectly via thrift so we =
are now looking at building a simple tool that will copy all rows from =
our 0.6.3 cluster to a parallell 0.7.5 cluster. Our question is now how =
to do that and ensure that we actually get all rows migrated? It's a =
pretty small cluster, 3 machines, a single keyspace, a singke =
columnfamily, ~2 million rows, a few GB of data, and a replication =
factor of 3.
>>>>>>>>=20
>>>>>>>> So what's the best way? Call get_range_slices and move through =
the entire token space? We also have all row keys in a secondary system, =
would it be better to use that and make calls to get_multi or =
get_multi_slices instead? Are we correct in assuming that if we use the =
consistencylevel ALL we'll get all rows?
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> /Henrik Schr=C3=B6der
>>>>>>>=20
>>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>=20
>>>>=20
>>>>=20
>>>=20
>>>=20
>>>=20
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra =
support
>>> http://www.datastax.com
>>=20
>>=20
>=20
>=20
>=20
> --=20
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com


--Apple-Mail-35-188388098
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Out =
of interest i've done some more digging. Not sure how much more I've =
contributed but here goes...<div><br></div><div>Ran this against an =
clean v 0.6.12 and it works (I expected it to fail on the first =
read)</div><div><br></div><div><div>&nbsp;&nbsp; &nbsp;client =3D =
pycassa.connect()</div><div>&nbsp;&nbsp; &nbsp;standard1 =3D =
pycassa.ColumnFamily(client, 'Keyspace1', =
'Standard1')</div><div><br></div><div>&nbsp;&nbsp; &nbsp;uni_str =3D =
u"=E6=95=B0=E6=99=82=E9=96=93"</div><div>&nbsp;&nbsp; &nbsp;uni_str =3D =
uni_str.encode("utf-8")</div><div>&nbsp;&nbsp; =
&nbsp;</div><div>&nbsp;&nbsp; &nbsp;print "Insert row", =
uni_str</div><div>&nbsp;&nbsp; &nbsp;print uni_str, =
standard1.insert(uni_str, {"bar" : =
"baz"})</div><div><br></div><div>&nbsp;&nbsp; &nbsp;print "Read =
rows"</div><div>&nbsp;&nbsp; &nbsp;print "???", =
standard1.get("???")</div><div>&nbsp;&nbsp; &nbsp;print uni_str, =
standard1.get(uni_str)</div><div><br></div><div>Ran that against the =
current 0.6 head from the command line and it works. Run against the =
code running in intelli J and the code fails as expected. Code also =
fails as expected on 0.7.5</div><div><br></div><div>At one stage I =
grabbed the buffer created by&nbsp;fastbinary.encode_binary in the =
python generated&nbsp;batch_mutate_args.write() and it looked like the =
key was correctly utf-8 encoded (matching bytes to the previous utf-8 =
encoding of that string).</div><div><br></div><div>I've updated the git =
project&nbsp;<a =
href=3D"https://github.com/amorton/cassandra-unicode-bug">https://github.c=
om/amorton/cassandra-unicode-bug</a>&nbsp;</div><div><br></div><div>Am =
going to leave it there unless there is interest to keep&nbsp;looking =
into&nbsp;it.&nbsp;</div><div>
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; =
text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Cassandra Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></span>
</div>

<br><div><div>On 8 May 2011, at 13:31, Jonathan Ellis wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div>Right, =
that's sort of a half-repair: it will repair differences in<br>replies =
it got, but it won't doublecheck md5s on the rest in the<br>background. =
So if you're doing CL.ONE reads this is a no-op.<br><br>On Sat, May 7, =
2011 at 4:25 PM, aaron morton &lt;<a =
href=3D"mailto:aaron@thelastpickle.com">aaron@thelastpickle.com</a>&gt; =
wrote:<br><blockquote type=3D"cite">I remembered something like that so =
had a look at RangeSliceResponseResolver.resolve() &nbsp;in 0.6.12 and =
it looks like it schedules the repairs...<br></blockquote><blockquote =
type=3D"cite"><br></blockquote><blockquote type=3D"cite">&nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp;protected Row =
getReduced()<br></blockquote><blockquote type=3D"cite">&nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp;{<br></blockquote><blockquote =
type=3D"cite">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;ColumnFamily resolved =3D =
ReadResponseResolver.resolveSuperset(versions);<br></blockquote><blockquot=
e type=3D"cite">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;ReadResponseResolver.maybeScheduleRepairs(resolved, table, key, =
versions, versionSources);<br></blockquote><blockquote =
type=3D"cite">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;versions.clear();<br></blockquote><blockquote type=3D"cite">&nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;versionSources.clear();<br></blockquote><blockquote =
type=3D"cite">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;return new Row(key, resolved);<br></blockquote><blockquote =
type=3D"cite">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp;}<br></blockquote><blockquote =
type=3D"cite"><br></blockquote><blockquote =
type=3D"cite"><br></blockquote><blockquote type=3D"cite">Is that =
right?<br></blockquote><blockquote =
type=3D"cite"><br></blockquote><blockquote =
type=3D"cite"><br></blockquote><blockquote =
type=3D"cite">-----------------<br></blockquote><blockquote =
type=3D"cite">Aaron Morton<br></blockquote><blockquote =
type=3D"cite">Freelance Cassandra Developer<br></blockquote><blockquote =
type=3D"cite">@aaronmorton<br></blockquote><blockquote type=3D"cite"><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a><br>=
</blockquote><blockquote type=3D"cite"><br></blockquote><blockquote =
type=3D"cite">On 8 May 2011, at 00:48, Jonathan Ellis =
wrote:<br></blockquote><blockquote =
type=3D"cite"><br></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite">range_slices respects consistencylevel, but only =
single-row reads and<br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite">multiget do the *repair* part of =
RR.<br></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite">On Sat, May 7, 2011 at 1:44 AM, =
aaron morton &lt;<a =
href=3D"mailto:aaron@thelastpickle.com">aaron@thelastpickle.com</a>&gt; =
wrote:<br></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite">get_range_slices() does read =
repair if enabled (checked DoConsistencyChecksBoolean in the config, =
it's on by default) so you should be getting good reads. If you want =
belt-and-braces run nodetool repair =
first.<br></blockquote></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">Hope =
that helps.<br></blockquote></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">On 7 =
May 2011, at 11:46, Jeremy Hanna =
wrote:<br></blockquote></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite">Great! &nbsp;I just wanted to =
make sure you were getting the information you =
needed.<br></blockquote></blockquote></blockquote></blockquote><blockquote=
 type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote><blo=
ckquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite">On May 6, 2011, at 6:42 PM, =
Henrik Schr=C3=B6der =
wrote:<br></blockquote></blockquote></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote><blo=
ckquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">Well, =
I already completed the migration program. Using get_range_slices I =
could migrate a few thousand rows per second, which means that migrating =
all of our data would take a few minutes, and we'll end up with pristine =
datafiles for the new cluster. Problem =
solved!<br></blockquote></blockquote></blockquote></blockquote></blockquot=
e><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">I'll =
see if I can create datafiles in 0.6 that are uncleanable in 0.7 so that =
you all can repeat this and hopefully fix =
it.<br></blockquote></blockquote></blockquote></blockquote></blockquote><b=
lockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">/Henrik =
Schr=C3=B6der<br></blockquote></blockquote></blockquote></blockquote></blo=
ckquote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">On =
Sat, May 7, 2011 at 00:35, Jeremy Hanna &lt;<a =
href=3D"mailto:jeremy.hanna1234@gmail.com">jeremy.hanna1234@gmail.com</a>&=
gt; =
wrote:<br></blockquote></blockquote></blockquote></blockquote></blockquote=
><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">If =
you're able, go into the #cassandra channel on freenode (IRC) and talk =
to driftx or jbellis or aaron_morton about your problem. &nbsp;It could =
be that you don't have to do all of this based on a conversation =
there.<br></blockquote></blockquote></blockquote></blockquote></blockquote=
><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">On May =
6, 2011, at 5:04 AM, Henrik Schr=C3=B6der =
wrote:<br></blockquote></blockquote></blockquote></blockquote></blockquote=
><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite">I'll see if I can make some =
example broken files this =
weekend.<br></blockquote></blockquote></blockquote></blockquote></blockquo=
te></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">/Henrik =
Schr=C3=B6der<br></blockquote></blockquote></blockquote></blockquote></blo=
ckquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">On =
Fri, May 6, 2011 at 02:10, aaron morton &lt;<a =
href=3D"mailto:aaron@thelastpickle.com">aaron@thelastpickle.com</a>&gt; =
wrote:<br></blockquote></blockquote></blockquote></blockquote></blockquote=
></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">The =
difficulty is the different thrift clients between 0.6 and =
0.7.<br></blockquote></blockquote></blockquote></blockquote></blockquote><=
/blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">If you =
want to roll your own solution I would =
consider:<br></blockquote></blockquote></blockquote></blockquote></blockqu=
ote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">- =
write an app to talk to 0.6 and pull out the data using keys from the =
other system (so you know can check referential integrity while you are =
at it). Dump the data to flat =
file.<br></blockquote></blockquote></blockquote></blockquote></blockquote>=
</blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">- =
write an app to talk to 0.7 to load the data back =
in.<br></blockquote></blockquote></blockquote></blockquote></blockquote></=
blockquote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote=
 type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">I've =
not given up digging on your migration problem, having to manually dump =
and reload if you've done nothing wrong is not the best solution. I'll =
try to find some time this weekend to test =
with:<br></blockquote></blockquote></blockquote></blockquote></blockquote>=
</blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">- 0.6 =
server, random paritioner, standard CF's, byte =
column<br></blockquote></blockquote></blockquote></blockquote></blockquote=
></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">- load =
with python or the cli on osx or ubuntu (dont have a window machine any =
more)<br></blockquote></blockquote></blockquote></blockquote></blockquote>=
</blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">- =
migrate and see whats going =
on.<br></blockquote></blockquote></blockquote></blockquote></blockquote></=
blockquote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote=
 type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">If you =
can spare some sample data to load please send it over in the user group =
or my email =
address.<br></blockquote></blockquote></blockquote></blockquote></blockquo=
te></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite">Cheers<br></blockquote></blockquote></blockquote></blockquot=
e></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite">-----------------<br></blockquote></blockquote></blockquote>=
</blockquote></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite">Aaron =
Morton<br></blockquote></blockquote></blockquote></blockquote></blockquote=
></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite">Freelance Cassandra =
Developer<br></blockquote></blockquote></blockquote></blockquote></blockqu=
ote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite">@aaronmorton<br></blockquote></blockquote></blockquote></blo=
ckquote></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite"><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a><br>=
</blockquote></blockquote></blockquote></blockquote></blockquote></blockqu=
ote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote type=3D"cite">On 6 =
May 2011, at 05:52, Henrik Schr=C3=B6der =
wrote:<br></blockquote></blockquote></blockquote></blockquote></blockquote=
></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite">We can't do a straight upgrade =
from 0.6.13 to 0.7.5 because we have rows stored that have unicode keys, =
and Cassandra 0.7.5 thinks those rows in the sstables are corrupt, and =
it seems impossible to clean it up without losing =
data.<br></blockquote></blockquote></blockquote></blockquote></blockquote>=
</blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite">However, we can still read all =
rows perfectly via thrift so we are now looking at building a simple =
tool that will copy all rows from our 0.6.3 cluster to a parallell 0.7.5 =
cluster. Our question is now how to do that and ensure that we actually =
get all rows migrated? It's a pretty small cluster, 3 machines, a single =
keyspace, a singke columnfamily, ~2 million rows, a few GB of data, and =
a replication factor of =
3.<br></blockquote></blockquote></blockquote></blockquote></blockquote></b=
lockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite">So what's the best way? Call =
get_range_slices and move through the entire token space? We also have =
all row keys in a secondary system, would it be better to use that and =
make calls to get_multi or get_multi_slices instead? Are we correct in =
assuming that if we use the consistencylevel ALL we'll get all =
rows?<br></blockquote></blockquote></blockquote></blockquote></blockquote>=
</blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite">/Henrik =
Schr=C3=B6der<br></blockquote></blockquote></blockquote></blockquote></blo=
ckquote></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote></bl=
ockquote><blockquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote></blockquote><blo=
ckquote type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote =
type=3D"cite"><br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote =
type=3D"cite">--<br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite">Jonathan =
Ellis<br></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite">Project Chair, Apache =
Cassandra<br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite">co-founder of DataStax, the =
source for professional Cassandra =
support<br></blockquote></blockquote><blockquote type=3D"cite"><blockquote=
 type=3D"cite"><a =
href=3D"http://www.datastax.com">http://www.datastax.com</a><br></blockquo=
te></blockquote><blockquote type=3D"cite"><br></blockquote><blockquote =
type=3D"cite"><br></blockquote><br><br><br>-- <br>Jonathan =
Ellis<br>Project Chair, Apache Cassandra<br>co-founder of DataStax, the =
source for professional Cassandra support<br><a =
href=3D"http://www.datastax.com">http://www.datastax.com</a><br></div></bl=
ockquote></div><br></div></body></html>=

--Apple-Mail-35-188388098--