Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2DB1044EB for ; Mon, 9 May 2011 09:33:41 +0000 (UTC) Received: (qmail 99795 invoked by uid 500); 9 May 2011 09:33:38 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 99773 invoked by uid 500); 9 May 2011 09:33:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 99764 invoked by uid 99); 9 May 2011 09:33:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 May 2011 09:33:38 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a54.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 May 2011 09:33:34 +0000 Received: from homiemail-a54.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a54.g.dreamhost.com (Postfix) with ESMTP id 594C23A4061 for ; Mon, 9 May 2011 02:33:13 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; q=dns; s= thelastpickle.com; b=hidcBDmb/7S0TJ9QyIeLmwOtA1gE8IB46LAMbCu8Hi7 DGvhKCfH/rRO8PRhQT2u+JtPt/982znASl6bIl4aepWHCycnjNtx6I8oUAg5/eui lo0oJqochWchOiPWBFlvQjZsMeXsVrtVYCcsBVodlQCec55FtJH3y48TfVZNxOLU = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=MAMLhAX9ZIvt4JFEjk5EI/8hJCU=; b=EQtU6d2Fl8 beGKYXeQq9Fr5SZ2VwENBSyrtkSeXiFdb8fxsHZ3C3EZDDnoCh7B6WjEXA+6Q0Sm MPJyP9BVfl4SKkN9rs4sPpEg1+i5Q3pwm0NDx5AIrtTyb4EV8DySus2cks28GKeQ 7IOgFP/sp7ZevZYfLx8Lm8n7t/WLQm1wo= Received: from [10.0.1.151] (121-73-157-230.cable.telstraclear.net [121.73.157.230]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a54.g.dreamhost.com (Postfix) with ESMTPSA id B639D3A4058 for ; Mon, 9 May 2011 02:33:12 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Migrating all rows from 0.6.13 to 0.7.5 over thrift? From: aaron morton In-Reply-To: Date: Mon, 9 May 2011 21:33:08 +1200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <967C4367-1F04-4365-8237-E54DB3C3451F@gmail.com> <5E594C71-8FEF-425B-8BEE-B5DFE7BED9C7@thelastpickle.com> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1084) That was my initial thought, just wanted to see if there was anything = else going on. Sounds like Henrik has a workaround so all is well.=20 Cheers =20 ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 9 May 2011, at 18:10, Jonathan Ellis wrote: > Strongly suspect that he has invalid unicode characters in his keys. > 0.6 wasn't as good at validating those as 0.7. >=20 > On Sun, May 8, 2011 at 8:51 PM, aaron morton = wrote: >> Out of interest i've done some more digging. Not sure how much more = I've >> contributed but here goes... >> Ran this against an clean v 0.6.12 and it works (I expected it to = fail on >> the first read) >> client =3D pycassa.connect() >> standard1 =3D pycassa.ColumnFamily(client, 'Keyspace1', = 'Standard1') >> uni_str =3D u"=E6=95=B0=E6=99=82=E9=96=93" >> uni_str =3D uni_str.encode("utf-8") >>=20 >> print "Insert row", uni_str >> print uni_str, standard1.insert(uni_str, {"bar" : "baz"}) >> print "Read rows" >> print "???", standard1.get("???") >> print uni_str, standard1.get(uni_str) >> Ran that against the current 0.6 head from the command line and it = works. >> Run against the code running in intelli J and the code fails as = expected. >> Code also fails as expected on 0.7.5 >> At one stage I grabbed the buffer created by fastbinary.encode_binary = in the >> python generated batch_mutate_args.write() and it looked like the key = was >> correctly utf-8 encoded (matching bytes to the previous utf-8 = encoding of >> that string). >> I've updated the git >> project https://github.com/amorton/cassandra-unicode-bug >> Am going to leave it there unless there is interest to keep looking >> into it. >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> On 8 May 2011, at 13:31, Jonathan Ellis wrote: >>=20 >> Right, that's sort of a half-repair: it will repair differences in >> replies it got, but it won't doublecheck md5s on the rest in the >> background. So if you're doing CL.ONE reads this is a no-op. >>=20 >> On Sat, May 7, 2011 at 4:25 PM, aaron morton = >> wrote: >>=20 >> I remembered something like that so had a look at >> RangeSliceResponseResolver.resolve() in 0.6.12 and it looks like it >> schedules the repairs... >>=20 >> protected Row getReduced() >>=20 >> { >>=20 >> ColumnFamily resolved =3D >> ReadResponseResolver.resolveSuperset(versions); >>=20 >> ReadResponseResolver.maybeScheduleRepairs(resolved, = table, >> key, versions, versionSources); >>=20 >> versions.clear(); >>=20 >> versionSources.clear(); >>=20 >> return new Row(key, resolved); >>=20 >> } >>=20 >>=20 >> Is that right? >>=20 >>=20 >> ----------------- >>=20 >> Aaron Morton >>=20 >> Freelance Cassandra Developer >>=20 >> @aaronmorton >>=20 >> http://www.thelastpickle.com >>=20 >> On 8 May 2011, at 00:48, Jonathan Ellis wrote: >>=20 >> range_slices respects consistencylevel, but only single-row reads and >>=20 >> multiget do the *repair* part of RR. >>=20 >> On Sat, May 7, 2011 at 1:44 AM, aaron morton = >> wrote: >>=20 >> get_range_slices() does read repair if enabled (checked >> DoConsistencyChecksBoolean in the config, it's on by default) so you = should >> be getting good reads. If you want belt-and-braces run nodetool = repair >> first. >>=20 >> Hope that helps. >>=20 >>=20 >> On 7 May 2011, at 11:46, Jeremy Hanna wrote: >>=20 >> Great! I just wanted to make sure you were getting the information = you >> needed. >>=20 >> On May 6, 2011, at 6:42 PM, Henrik Schr=C3=B6der wrote: >>=20 >> Well, I already completed the migration program. Using = get_range_slices I >> could migrate a few thousand rows per second, which means that = migrating all >> of our data would take a few minutes, and we'll end up with pristine >> datafiles for the new cluster. Problem solved! >>=20 >> I'll see if I can create datafiles in 0.6 that are uncleanable in 0.7 = so >> that you all can repeat this and hopefully fix it. >>=20 >>=20 >> /Henrik Schr=C3=B6der >>=20 >> On Sat, May 7, 2011 at 00:35, Jeremy Hanna = >> wrote: >>=20 >> If you're able, go into the #cassandra channel on freenode (IRC) and = talk to >> driftx or jbellis or aaron_morton about your problem. It could be = that you >> don't have to do all of this based on a conversation there. >>=20 >> On May 6, 2011, at 5:04 AM, Henrik Schr=C3=B6der wrote: >>=20 >> I'll see if I can make some example broken files this weekend. >>=20 >>=20 >> /Henrik Schr=C3=B6der >>=20 >> On Fri, May 6, 2011 at 02:10, aaron morton = wrote: >>=20 >> The difficulty is the different thrift clients between 0.6 and 0.7. >>=20 >> If you want to roll your own solution I would consider: >>=20 >> - write an app to talk to 0.6 and pull out the data using keys from = the >> other system (so you know can check referential integrity while you = are at >> it). Dump the data to flat file. >>=20 >> - write an app to talk to 0.7 to load the data back in. >>=20 >> I've not given up digging on your migration problem, having to = manually dump >> and reload if you've done nothing wrong is not the best solution. = I'll try >> to find some time this weekend to test with: >>=20 >> - 0.6 server, random paritioner, standard CF's, byte column >>=20 >> - load with python or the cli on osx or ubuntu (dont have a window = machine >> any more) >>=20 >> - migrate and see whats going on. >>=20 >> If you can spare some sample data to load please send it over in the = user >> group or my email address. >>=20 >> Cheers >>=20 >> ----------------- >>=20 >> Aaron Morton >>=20 >> Freelance Cassandra Developer >>=20 >> @aaronmorton >>=20 >> http://www.thelastpickle.com >>=20 >> On 6 May 2011, at 05:52, Henrik Schr=C3=B6der wrote: >>=20 >> We can't do a straight upgrade from 0.6.13 to 0.7.5 because we have = rows >> stored that have unicode keys, and Cassandra 0.7.5 thinks those rows = in the >> sstables are corrupt, and it seems impossible to clean it up without = losing >> data. >>=20 >> However, we can still read all rows perfectly via thrift so we are = now >> looking at building a simple tool that will copy all rows from our = 0.6.3 >> cluster to a parallell 0.7.5 cluster. Our question is now how to do = that and >> ensure that we actually get all rows migrated? It's a pretty small = cluster, >> 3 machines, a single keyspace, a singke columnfamily, ~2 million = rows, a few >> GB of data, and a replication factor of 3. >>=20 >> So what's the best way? Call get_range_slices and move through the = entire >> token space? We also have all row keys in a secondary system, would = it be >> better to use that and make calls to get_multi or get_multi_slices = instead? >> Are we correct in assuming that if we use the consistencylevel ALL = we'll get >> all rows? >>=20 >>=20 >> /Henrik Schr=C3=B6der >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >> -- >>=20 >> Jonathan Ellis >>=20 >> Project Chair, Apache Cassandra >>=20 >> co-founder of DataStax, the source for professional Cassandra support >>=20 >> http://www.datastax.com >>=20 >>=20 >>=20 >>=20 >>=20 >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >>=20 >>=20 >=20 >=20 >=20 > --=20 > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com