cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: cassandra mangling non-ascii keys
Date Wed, 09 Dec 2009 03:08:28 GMT
I don't remember, but it was definitely wrong in hindsight :(

On Mon, Dec 7, 2009 at 6:22 PM, Edmond Lau <edmond@ooyala.com> wrote:
> Ok - so my understanding from reading the two jira issues is that
> python and ruby treat the "string" thrift type as unencoded bytes
> whereas java treats them as utf-8 encoded bytes.  What was the
> rationale behind declaring keys to be of type "string" rather than of
> type "binary"?  With "binary", presumably java wouldn't treat keys as
> utf-8 encoded bytes.
>
> Edmond
>
> On Mon, Dec 7, 2009 at 3:09 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>> I suspect you will need to explicitly encode to UTF8 first, then.
>> (And decode when reading.)
>>
>> My reading of the relevant issues
>> (https://issues.apache.org/jira/browse/THRIFT-395,
>> https://issues.apache.org/jira/browse/THRIFT-414) is that this won't
>> be fixed any time soon.
>>
>> -Jonathan
>>
>> On Mon, Dec 7, 2009 at 4:56 PM, Edmond Lau <edmond@ooyala.com> wrote:
>>> This particular client was in Ruby.
>>>
>>> On Mon, Dec 7, 2009 at 2:49 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>>> (bugs in thrift, that is)
>>>>
>>>> On Mon, Dec 7, 2009 at 4:49 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>> what language are your clients in?  there are definitely some bugs
>>>>> there when communicating b/t client and server of different languages.
>>>>> :(
>>>>>
>>>>> On Mon, Dec 7, 2009 at 4:43 PM, Edmond Lau <edmond@ooyala.com>
wrote:
>>>>>> I'm using non-ascii keys on Cassandra, relatively close to trunk
at
>>>>>> r880926, and my some of my keys get mangled.
>>>>>>
>>>>>> As a simple test case, if I insert a one-byte key anywhere between
>>>>>> \200 and \377 (octal for 128 to 255) through the thrift interface,
and
>>>>>> then query back my data with multi get, I get a hash back that has
>>>>>> "\357\277\275" as the key.  All those one-byte keys get mapped to
the
>>>>>> same bucket, so if I insert with the key \205, I get the data back
>>>>>> when querying for \300.  So either a) there's a bug in thrift, b)
>>>>>> Cassandra doesn't support non-ascii keys, or c) Cassandra is mangling
>>>>>> my key somewhere.
>>>>>>
>>>>>> Has anyone else run into this issue?
>>>>>>
>>>>>> Edmond
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message