Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A90254ACF for ; Mon, 9 May 2011 03:52:26 +0000 (UTC) Received: (qmail 17001 invoked by uid 500); 9 May 2011 03:52:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 16551 invoked by uid 500); 9 May 2011 03:52:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 16543 invoked by uid 99); 9 May 2011 03:52:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 May 2011 03:52:21 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a79.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 May 2011 03:52:13 +0000 Received: from homiemail-a79.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a79.g.dreamhost.com (Postfix) with ESMTP id 89F5E7D406C for ; Sun, 8 May 2011 20:51:46 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=RqhqlLXM42 9a4x2YCWyaQNXZgSqSGHFD4/MUXcyxxK772riy3x8tZxHuUojNUxJ6Yhtwc3tNB6 Z/TbIrA57mZHXXEkpQHNpxIHtPJjXyPPjSGL18aDT5hwCiJPTVMtZ7x6isqEsAJa Hd6bc75T+B1ZmZYcAqqDMuEAlJ/ErFEaE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=JJMM8Vhk7yAnZ0Qh BQ0SS6eSjh4=; b=aZC//+/cdYNHozySVmSHRtCDl0fGzB20wlBFvPgOAPjWOH/Y dEQcinja9bwRAbP8EnB0Gkbq0mUcpjDj6qI5NUEUzGXkbvKBYNdW6KWauLUxKUfU wROPi9vX8vVO4/RwSmyDBNuw3TPeD0B98uOJgMuGYt6rMCIV9KBV46lUm8E= Received: from [10.0.1.151] (121-73-157-230.cable.telstraclear.net [121.73.157.230]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a79.g.dreamhost.com (Postfix) with ESMTPSA id 910CF7D4057 for ; Sun, 8 May 2011 20:51:45 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/alternative; boundary=Apple-Mail-35-188388098 Subject: Re: Migrating all rows from 0.6.13 to 0.7.5 over thrift? Date: Mon, 9 May 2011 15:51:42 +1200 In-Reply-To: To: user@cassandra.apache.org References: <967C4367-1F04-4365-8237-E54DB3C3451F@gmail.com> <5E594C71-8FEF-425B-8BEE-B5DFE7BED9C7@thelastpickle.com> Message-Id: X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-35-188388098 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Out of interest i've done some more digging. Not sure how much more I've = contributed but here goes... Ran this against an clean v 0.6.12 and it works (I expected it to fail = on the first read) client =3D pycassa.connect() standard1 =3D pycassa.ColumnFamily(client, 'Keyspace1', 'Standard1') uni_str =3D u"=E6=95=B0=E6=99=82=E9=96=93" uni_str =3D uni_str.encode("utf-8") =20 print "Insert row", uni_str print uni_str, standard1.insert(uni_str, {"bar" : "baz"}) print "Read rows" print "???", standard1.get("???") print uni_str, standard1.get(uni_str) Ran that against the current 0.6 head from the command line and it = works. Run against the code running in intelli J and the code fails as = expected. Code also fails as expected on 0.7.5 At one stage I grabbed the buffer created by fastbinary.encode_binary in = the python generated batch_mutate_args.write() and it looked like the = key was correctly utf-8 encoded (matching bytes to the previous utf-8 = encoding of that string). I've updated the git project = https://github.com/amorton/cassandra-unicode-bug=20 Am going to leave it there unless there is interest to keep looking into = it.=20 ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 8 May 2011, at 13:31, Jonathan Ellis wrote: > Right, that's sort of a half-repair: it will repair differences in > replies it got, but it won't doublecheck md5s on the rest in the > background. So if you're doing CL.ONE reads this is a no-op. >=20 > On Sat, May 7, 2011 at 4:25 PM, aaron morton = wrote: >> I remembered something like that so had a look at = RangeSliceResponseResolver.resolve() in 0.6.12 and it looks like it = schedules the repairs... >>=20 >> protected Row getReduced() >> { >> ColumnFamily resolved =3D = ReadResponseResolver.resolveSuperset(versions); >> ReadResponseResolver.maybeScheduleRepairs(resolved, = table, key, versions, versionSources); >> versions.clear(); >> versionSources.clear(); >> return new Row(key, resolved); >> } >>=20 >>=20 >> Is that right? >>=20 >>=20 >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 8 May 2011, at 00:48, Jonathan Ellis wrote: >>=20 >>> range_slices respects consistencylevel, but only single-row reads = and >>> multiget do the *repair* part of RR. >>>=20 >>> On Sat, May 7, 2011 at 1:44 AM, aaron morton = wrote: >>>> get_range_slices() does read repair if enabled (checked = DoConsistencyChecksBoolean in the config, it's on by default) so you = should be getting good reads. If you want belt-and-braces run nodetool = repair first. >>>>=20 >>>> Hope that helps. >>>>=20 >>>>=20 >>>> On 7 May 2011, at 11:46, Jeremy Hanna wrote: >>>>=20 >>>>> Great! I just wanted to make sure you were getting the = information you needed. >>>>>=20 >>>>> On May 6, 2011, at 6:42 PM, Henrik Schr=C3=B6der wrote: >>>>>=20 >>>>>> Well, I already completed the migration program. Using = get_range_slices I could migrate a few thousand rows per second, which = means that migrating all of our data would take a few minutes, and we'll = end up with pristine datafiles for the new cluster. Problem solved! >>>>>>=20 >>>>>> I'll see if I can create datafiles in 0.6 that are uncleanable in = 0.7 so that you all can repeat this and hopefully fix it. >>>>>>=20 >>>>>>=20 >>>>>> /Henrik Schr=C3=B6der >>>>>>=20 >>>>>> On Sat, May 7, 2011 at 00:35, Jeremy Hanna = wrote: >>>>>> If you're able, go into the #cassandra channel on freenode (IRC) = and talk to driftx or jbellis or aaron_morton about your problem. It = could be that you don't have to do all of this based on a conversation = there. >>>>>>=20 >>>>>> On May 6, 2011, at 5:04 AM, Henrik Schr=C3=B6der wrote: >>>>>>=20 >>>>>>> I'll see if I can make some example broken files this weekend. >>>>>>>=20 >>>>>>>=20 >>>>>>> /Henrik Schr=C3=B6der >>>>>>>=20 >>>>>>> On Fri, May 6, 2011 at 02:10, aaron morton = wrote: >>>>>>> The difficulty is the different thrift clients between 0.6 and = 0.7. >>>>>>>=20 >>>>>>> If you want to roll your own solution I would consider: >>>>>>> - write an app to talk to 0.6 and pull out the data using keys = from the other system (so you know can check referential integrity while = you are at it). Dump the data to flat file. >>>>>>> - write an app to talk to 0.7 to load the data back in. >>>>>>>=20 >>>>>>> I've not given up digging on your migration problem, having to = manually dump and reload if you've done nothing wrong is not the best = solution. I'll try to find some time this weekend to test with: >>>>>>>=20 >>>>>>> - 0.6 server, random paritioner, standard CF's, byte column >>>>>>> - load with python or the cli on osx or ubuntu (dont have a = window machine any more) >>>>>>> - migrate and see whats going on. >>>>>>>=20 >>>>>>> If you can spare some sample data to load please send it over in = the user group or my email address. >>>>>>>=20 >>>>>>> Cheers >>>>>>>=20 >>>>>>> ----------------- >>>>>>> Aaron Morton >>>>>>> Freelance Cassandra Developer >>>>>>> @aaronmorton >>>>>>> http://www.thelastpickle.com >>>>>>>=20 >>>>>>> On 6 May 2011, at 05:52, Henrik Schr=C3=B6der wrote: >>>>>>>=20 >>>>>>>> We can't do a straight upgrade from 0.6.13 to 0.7.5 because we = have rows stored that have unicode keys, and Cassandra 0.7.5 thinks = those rows in the sstables are corrupt, and it seems impossible to clean = it up without losing data. >>>>>>>>=20 >>>>>>>> However, we can still read all rows perfectly via thrift so we = are now looking at building a simple tool that will copy all rows from = our 0.6.3 cluster to a parallell 0.7.5 cluster. Our question is now how = to do that and ensure that we actually get all rows migrated? It's a = pretty small cluster, 3 machines, a single keyspace, a singke = columnfamily, ~2 million rows, a few GB of data, and a replication = factor of 3. >>>>>>>>=20 >>>>>>>> So what's the best way? Call get_range_slices and move through = the entire token space? We also have all row keys in a secondary system, = would it be better to use that and make calls to get_multi or = get_multi_slices instead? Are we correct in assuming that if we use the = consistencylevel ALL we'll get all rows? >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> /Henrik Schr=C3=B6der >>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >>>>=20 >>>=20 >>>=20 >>>=20 >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra = support >>> http://www.datastax.com >>=20 >>=20 >=20 >=20 >=20 > --=20 > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com --Apple-Mail-35-188388098 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Out = of interest i've done some more digging. Not sure how much more I've = contributed but here goes...

Ran this against an = clean v 0.6.12 and it works (I expected it to fail on the first = read)

    client =3D = pycassa.connect()
    standard1 =3D = pycassa.ColumnFamily(client, 'Keyspace1', = 'Standard1')

    uni_str =3D = u"=E6=95=B0=E6=99=82=E9=96=93"
    uni_str =3D = uni_str.encode("utf-8")
   =  
    print "Insert row", = uni_str
    print uni_str, = standard1.insert(uni_str, {"bar" : = "baz"})

    print "Read = rows"
    print "???", = standard1.get("???")
    print uni_str, = standard1.get(uni_str)

Ran that against the = current 0.6 head from the command line and it works. Run against the = code running in intelli J and the code fails as expected. Code also = fails as expected on 0.7.5

At one stage I = grabbed the buffer created by fastbinary.encode_binary in the = python generated batch_mutate_args.write() and it looked like the = key was correctly utf-8 encoded (matching bytes to the previous utf-8 = encoding of that string).

I've updated the git = project https://github.c= om/amorton/cassandra-unicode-bug 

Am = going to leave it there unless there is interest to keep looking = into it. 
http://www.thelastpickle.com

On 8 May 2011, at 13:31, Jonathan Ellis wrote:

Right, = that's sort of a half-repair: it will repair differences in
replies = it got, but it won't doublecheck md5s on the rest in the
background. = So if you're doing CL.ONE reads this is a no-op.

On Sat, May 7, = 2011 at 4:25 PM, aaron morton <aaron@thelastpickle.com> = wrote:
I remembered something like that so = had a look at RangeSliceResponseResolver.resolve()  in 0.6.12 and = it looks like it schedules the repairs...

    =        protected Row = getReduced()
    =        {
              =  ColumnFamily resolved =3D = ReadResponseResolver.resolveSuperset(versions);
              =  ReadResponseResolver.maybeScheduleRepairs(resolved, table, key, = versions, versionSources);
              =  versions.clear();
  =             =  versionSources.clear();
              =  return new Row(key, resolved);
          =  }


Is that = right?


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
=

On 8 May 2011, at 00:48, Jonathan Ellis = wrote:

range_slices respects consistencylevel, but only = single-row reads and
multiget do the *repair* part of = RR.

On Sat, May 7, 2011 at 1:44 AM, = aaron morton <aaron@thelastpickle.com> = wrote:
get_range_slices() does read = repair if enabled (checked DoConsistencyChecksBoolean in the config, = it's on by default) so you should be getting good reads. If you want = belt-and-braces run nodetool repair = first.

Hope = that helps.


On 7 = May 2011, at 11:46, Jeremy Hanna = wrote:

Great!  I just wanted to = make sure you were getting the information you = needed.

On May 6, 2011, at 6:42 PM, = Henrik Schr=C3=B6der = wrote:

Well, = I already completed the migration program. Using get_range_slices I = could migrate a few thousand rows per second, which means that migrating = all of our data would take a few minutes, and we'll end up with pristine = datafiles for the new cluster. Problem = solved!

I'll = see if I can create datafiles in 0.6 that are uncleanable in 0.7 so that = you all can repeat this and hopefully fix = it.


/Henrik = Schr=C3=B6der

On = Sat, May 7, 2011 at 00:35, Jeremy Hanna <jeremy.hanna1234@gmail.com&= gt; = wrote:
If = you're able, go into the #cassandra channel on freenode (IRC) and talk = to driftx or jbellis or aaron_morton about your problem.  It could = be that you don't have to do all of this based on a conversation = there.

On May = 6, 2011, at 5:04 AM, Henrik Schr=C3=B6der = wrote:

I'll see if I can make some = example broken files this = weekend.


/Henrik = Schr=C3=B6der

On = Fri, May 6, 2011 at 02:10, aaron morton <aaron@thelastpickle.com> = wrote:
The = difficulty is the different thrift clients between 0.6 and = 0.7.
<= /blockquote>

If you = want to roll your own solution I would = consider:
- = write an app to talk to 0.6 and pull out the data using keys from the = other system (so you know can check referential integrity while you are = at it). Dump the data to flat = file.
=
- = write an app to talk to 0.7 to load the data back = in.

I've = not given up digging on your migration problem, having to manually dump = and reload if you've done nothing wrong is not the best solution. I'll = try to find some time this weekend to test = with:
=

- 0.6 = server, random paritioner, standard CF's, byte = column
- load = with python or the cli on osx or ubuntu (dont have a window machine any = more)
=
- = migrate and see whats going = on.

If you = can spare some sample data to load please send it over in the user group = or my email = address.

Cheers

-----------------
=
Aaron = Morton
Freelance Cassandra = Developer
@aaronmorton
http://www.thelastpickle.com
=

On 6 = May 2011, at 05:52, Henrik Schr=C3=B6der = wrote:

We can't do a straight upgrade = from 0.6.13 to 0.7.5 because we have rows stored that have unicode keys, = and Cassandra 0.7.5 thinks those rows in the sstables are corrupt, and = it seems impossible to clean it up without losing = data.
=

However, we can still read all = rows perfectly via thrift so we are now looking at building a simple = tool that will copy all rows from our 0.6.3 cluster to a parallell 0.7.5 = cluster. Our question is now how to do that and ensure that we actually = get all rows migrated? It's a pretty small cluster, 3 machines, a single = keyspace, a singke columnfamily, ~2 million rows, a few GB of data, and = a replication factor of = 3.

So what's the best way? Call = get_range_slices and move through the entire token space? We also have = all row keys in a secondary system, would it be better to use that and = make calls to get_multi or get_multi_slices instead? Are we correct in = assuming that if we use the consistencylevel ALL we'll get all = rows?
=


/Henrik = Schr=C3=B6der










--
Jonathan = Ellis
Project Chair, Apache = Cassandra
co-founder of DataStax, the = source for professional Cassandra = support
http://www.datastax.com





--
Jonathan = Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the = source for professional Cassandra support
http://www.datastax.com

= --Apple-Mail-35-188388098--