incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edmond Lau <edm...@ooyala.com>
Subject Re: cassandra mangling non-ascii keys
Date Tue, 08 Dec 2009 00:22:03 GMT
Ok - so my understanding from reading the two jira issues is that
python and ruby treat the "string" thrift type as unencoded bytes
whereas java treats them as utf-8 encoded bytes.  What was the
rationale behind declaring keys to be of type "string" rather than of
type "binary"?  With "binary", presumably java wouldn't treat keys as
utf-8 encoded bytes.

Edmond

On Mon, Dec 7, 2009 at 3:09 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> I suspect you will need to explicitly encode to UTF8 first, then.
> (And decode when reading.)
>
> My reading of the relevant issues
> (https://issues.apache.org/jira/browse/THRIFT-395,
> https://issues.apache.org/jira/browse/THRIFT-414) is that this won't
> be fixed any time soon.
>
> -Jonathan
>
> On Mon, Dec 7, 2009 at 4:56 PM, Edmond Lau <edmond@ooyala.com> wrote:
>> This particular client was in Ruby.
>>
>> On Mon, Dec 7, 2009 at 2:49 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> (bugs in thrift, that is)
>>>
>>> On Mon, Dec 7, 2009 at 4:49 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>>> what language are your clients in?  there are definitely some bugs
>>>> there when communicating b/t client and server of different languages.
>>>> :(
>>>>
>>>> On Mon, Dec 7, 2009 at 4:43 PM, Edmond Lau <edmond@ooyala.com> wrote:
>>>>> I'm using non-ascii keys on Cassandra, relatively close to trunk at
>>>>> r880926, and my some of my keys get mangled.
>>>>>
>>>>> As a simple test case, if I insert a one-byte key anywhere between
>>>>> \200 and \377 (octal for 128 to 255) through the thrift interface, and
>>>>> then query back my data with multi get, I get a hash back that has
>>>>> "\357\277\275" as the key.  All those one-byte keys get mapped to the
>>>>> same bucket, so if I insert with the key \205, I get the data back
>>>>> when querying for \300.  So either a) there's a bug in thrift, b)
>>>>> Cassandra doesn't support non-ascii keys, or c) Cassandra is mangling
>>>>> my key somewhere.
>>>>>
>>>>> Has anyone else run into this issue?
>>>>>
>>>>> Edmond
>>>>>
>>>>
>>>
>>
>

Mime
View raw message