Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 74999 invoked from network); 9 Dec 2009 03:09:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Dec 2009 03:09:15 -0000 Received: (qmail 70333 invoked by uid 500); 9 Dec 2009 03:09:14 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 70284 invoked by uid 500); 9 Dec 2009 03:09:14 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 70268 invoked by uid 99); 9 Dec 2009 03:09:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Dec 2009 03:09:14 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 209.85.219.210 as permitted sender) Received: from [209.85.219.210] (HELO mail-ew0-f210.google.com) (209.85.219.210) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Dec 2009 03:09:11 +0000 Received: by ewy2 with SMTP id 2so4859024ewy.27 for ; Tue, 08 Dec 2009 19:08:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=VcuMRvJOUwbmQE3X8POCeK0m1Gwe//qSPowL37PQaHc=; b=g9ghtRPGu1dh+F8A8oxdKlW2iaUbxIhHtpatCvLsrnzxclGru8LB64OEnCy1SROVKc 1/ipJyfX9x6SZHum/sKmVHtK+y+kekQaG64B/vv5IFBx3YcPzzAnI3HYrmw9tuOZk9Td WNmhE0qgch067Q41xdDTSLPeT4fsldGYc/vjE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=L87+IJz7Tsw1mzWiiEgRpz5n75sX1ZsCymk+En/JAWCuNfob8KuIMAXr5WQ7k/gAI8 zBrBdlknaTkPpfN9PjKp017ffd7Hu36BDKyeQVBkkBGB6i19ObCCQWVJn1+ApQbYKP0M sT/QLQDczOYKyas5FEr1L0vKkvmDZJilWPmwA= MIME-Version: 1.0 Received: by 10.216.85.209 with SMTP id u59mr3046397wee.109.1260328128278; Tue, 08 Dec 2009 19:08:48 -0800 (PST) In-Reply-To: References: From: Jonathan Ellis Date: Tue, 8 Dec 2009 21:08:28 -0600 Message-ID: Subject: Re: cassandra mangling non-ascii keys To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I don't remember, but it was definitely wrong in hindsight :( On Mon, Dec 7, 2009 at 6:22 PM, Edmond Lau wrote: > Ok - so my understanding from reading the two jira issues is that > python and ruby treat the "string" thrift type as unencoded bytes > whereas java treats them as utf-8 encoded bytes. =A0What was the > rationale behind declaring keys to be of type "string" rather than of > type "binary"? =A0With "binary", presumably java wouldn't treat keys as > utf-8 encoded bytes. > > Edmond > > On Mon, Dec 7, 2009 at 3:09 PM, Jonathan Ellis wrote: >> I suspect you will need to explicitly encode to UTF8 first, then. >> (And decode when reading.) >> >> My reading of the relevant issues >> (https://issues.apache.org/jira/browse/THRIFT-395, >> https://issues.apache.org/jira/browse/THRIFT-414) is that this won't >> be fixed any time soon. >> >> -Jonathan >> >> On Mon, Dec 7, 2009 at 4:56 PM, Edmond Lau wrote: >>> This particular client was in Ruby. >>> >>> On Mon, Dec 7, 2009 at 2:49 PM, Jonathan Ellis wrot= e: >>>> (bugs in thrift, that is) >>>> >>>> On Mon, Dec 7, 2009 at 4:49 PM, Jonathan Ellis wro= te: >>>>> what language are your clients in? =A0there are definitely some bugs >>>>> there when communicating b/t client and server of different languages= . >>>>> :( >>>>> >>>>> On Mon, Dec 7, 2009 at 4:43 PM, Edmond Lau wrote: >>>>>> I'm using non-ascii keys on Cassandra, relatively close to trunk at >>>>>> r880926, and my some of my keys get mangled. >>>>>> >>>>>> As a simple test case, if I insert a one-byte key anywhere between >>>>>> \200 and \377 (octal for 128 to 255) through the thrift interface, a= nd >>>>>> then query back my data with multi get, I get a hash back that has >>>>>> "\357\277\275" as the key. =A0All those one-byte keys get mapped to = the >>>>>> same bucket, so if I insert with the key \205, I get the data back >>>>>> when querying for \300. =A0So either a) there's a bug in thrift, b) >>>>>> Cassandra doesn't support non-ascii keys, or c) Cassandra is manglin= g >>>>>> my key somewhere. >>>>>> >>>>>> Has anyone else run into this issue? >>>>>> >>>>>> Edmond >>>>>> >>>>> >>>> >>> >> >