qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Ross <justin.r...@gmail.com>
Subject Re: UTF8 / binary strings in dynamic languages
Date Wed, 21 Aug 2013 16:14:04 GMT
I'm missing something about this.  The python 2-3 migration plan is to
treat a value expressed with 'str' as unambiguously textual, and a
value expressed with 'bytes' as unambiguously data.  Doesn't that line
up with this proposal?

On Wed, Aug 21, 2013 at 11:49 AM, Rafael Schloming <rhs@alum.mit.edu> wrote:
> I think for python at least if we were to treat ambiguous string values as
> text rather than data, we would be at odds with the python community's 2->3
> migration plan. The following thread has a useful discussion of this that
> is worth a careful read:
> http://stackoverflow.com/questions/1736228/python-data-vs-text/1736279#1736279
> --Rafael
> On Wed, Aug 21, 2013 at 11:31 AM, Justin Ross <justin.ross@gmail.com> wrote:
>> Jimmy, thanks for getting this started.  I'd love your feedback to
>> help sort this out.
>> I think these are the cases:
>> 1. If the language string is unambiguously textual, send it as amqp str16
>> 2. If the language string is unambiguously arbitrary bytes, send it as
>> amqp vbin
>> These are easy.  We can tell the user's intention, and we can do the
>> right thing.
>> 3. If the language string is an overloaded text/bytes type, as is
>> regrettably quite common, what do we do then?
>> The current answer to this question is "send it as vbin".  That's very
>> safe, insofar as it won't throw any sort of encoding exception.  It
>> does not, however, always honor what I think is the user's more
>> typical intention: produce an ascii string at the other end.
>> So for 3, I'd like to consider the possibility of, by default, sending
>> ambiguous language strings as ascii rendered to amqp str16.  This
>> requires an encoding step that may produce errors.  And maybe that's
>> just too obnoxious!  That's what I'd like to know.
>> In summary, if we have a way to determine what the user wanted (text
>> or bytes), we should try to carry that through on the wire.  At the
>> following URL I've tried to map out what type information we can get
>> for each language.  Please update it as you please.
>> https://cwiki.apache.org/confluence/display/qpid/Language+support+for+unambiguous+text+string+and+byte+array+types
>> On Wed, Aug 21, 2013 at 8:44 AM, Jimmy Jones <jimmyjones2@gmx.co.uk>
>> wrote:
>> >> > AFAIK in perl, if you include unicode characters in a string it'll
>> >> > set the utf8 flag. If you don't include any unicode characters (eg.
>> >> > bit ascii, or raw bytes) the flag won't be set. So given a perl
>> >> > scalar that doesn't contain any utf8 characters, you don't know if
>> >> > its a textual string (str16) or a binary string (vbin). There is a
>> >> > is_utf8_string function, but that'll only tell you if the string
>> >> > would be valid utf8, but it could be a binary string that happens to
>> >> > be valid utf8, so that's not really safe.
>> >>
>> >> You can explicitly mark it as utf8 using utf8::upgrade() though, right?
>> >> Certainly I tried that in a simple test and the property in question was
>> >> then sent as str16.
>> >
>> > Yes, if I as a user had a string that was textual, I could call
>> utf8::upgrade() to ensure it got sent as str16. I guess this is similar in
>> concept to calling setEncoding in C++, although maybe less natural in a
>> dynamically typed language.
>> It would be more reasonable to treat perl scalars as textual for our
>> API if perl offered a good way to explicitly handle byte arrays.  My
>> (certainly insufficient) web browsing suggested that wasn't really
>> available, or not in a form recommended for use.  Any candidates for a
>> serviceable explicitly-arbitrary-bytes-and-not-text-at-all "type" in
>> perl?
>> Thanks!
>> Justin
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: dev-help@qpid.apache.org

To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

View raw message