qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Ross <justin.r...@gmail.com>
Subject Re: UTF8 / binary strings in dynamic languages
Date Wed, 21 Aug 2013 15:31:41 GMT
Jimmy, thanks for getting this started.  I'd love your feedback to
help sort this out.

I think these are the cases:

1. If the language string is unambiguously textual, send it as amqp str16
2. If the language string is unambiguously arbitrary bytes, send it as amqp vbin

These are easy.  We can tell the user's intention, and we can do the
right thing.

3. If the language string is an overloaded text/bytes type, as is
regrettably quite common, what do we do then?

The current answer to this question is "send it as vbin".  That's very
safe, insofar as it won't throw any sort of encoding exception.  It
does not, however, always honor what I think is the user's more
typical intention: produce an ascii string at the other end.

So for 3, I'd like to consider the possibility of, by default, sending
ambiguous language strings as ascii rendered to amqp str16.  This
requires an encoding step that may produce errors.  And maybe that's
just too obnoxious!  That's what I'd like to know.

In summary, if we have a way to determine what the user wanted (text
or bytes), we should try to carry that through on the wire.  At the
following URL I've tried to map out what type information we can get
for each language.  Please update it as you please.

  https://cwiki.apache.org/confluence/display/qpid/Language+support+for+unambiguous+text+string+and+byte+array+types

On Wed, Aug 21, 2013 at 8:44 AM, Jimmy Jones <jimmyjones2@gmx.co.uk> wrote:
>> > AFAIK in perl, if you include unicode characters in a string it'll
>> > set the utf8 flag. If you don't include any unicode characters (eg. 7
>> > bit ascii, or raw bytes) the flag won't be set. So given a perl
>> > scalar that doesn't contain any utf8 characters, you don't know if
>> > its a textual string (str16) or a binary string (vbin). There is a
>> > is_utf8_string function, but that'll only tell you if the string
>> > would be valid utf8, but it could be a binary string that happens to
>> > be valid utf8, so that's not really safe.
>>
>> You can explicitly mark it as utf8 using utf8::upgrade() though, right?
>> Certainly I tried that in a simple test and the property in question was
>> then sent as str16.
>
> Yes, if I as a user had a string that was textual, I could call utf8::upgrade() to ensure
it got sent as str16. I guess this is similar in concept to calling setEncoding in C++, although
maybe less natural in a dynamically typed language.

It would be more reasonable to treat perl scalars as textual for our
API if perl offered a good way to explicitly handle byte arrays.  My
(certainly insufficient) web browsing suggested that wasn't really
available, or not in a form recommended for use.  Any candidates for a
serviceable explicitly-arbitrary-bytes-and-not-text-at-all "type" in
perl?

Thanks!
Justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Mime
View raw message