qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Stitcher <astitc...@redhat.com>
Subject Re: UTF8 / binary strings in dynamic languages
Date Thu, 22 Aug 2013 13:33:39 GMT
On Wed, 2013-08-21 at 11:31 -0400, Justin Ross wrote:
> Jimmy, thanks for getting this started.  I'd love your feedback to
> help sort this out.
> 
> I think these are the cases:
> 
> 1. If the language string is unambiguously textual, send it as amqp str16
> 2. If the language string is unambiguously arbitrary bytes, send it as amqp vbin
> 

I think the underlying problem is that there is no unambiguous text/bin
distinction for any given input for some of the (computer) languages.

Eg in the current C++ bindings both test strings and binary data are
equally represented by std:string. Now it is true that some of those
represent valid ASCII (no high bits sets) or utf8 encoding of unicode
codepoints. and can be losslessly transferred as such. However without
encoding the user semantic up front it is impossible to say distinguish
between a string starting "ABCD" and the binary representation of a
structure that starts with an integer 1145258561.

It seems from these discussions that Python and Perl (and probably other
languages) have idiomatic ways to deal with this distinction and we
should make those work.

Even in C++ there are some types that I think would better encode the
users intention: I would opine that std::string is more often used for
text and could be deemed utf8 encoded (C++03 has std::wstring to
indicate wide strings but with no defined encoding or character width;
in practice either utf16 or utf32 depending on implementation, C++11 is
better and has std::u16string and std::u32string which indicate utf16
and utf32 encoding respectively). To indicate binary data I'd say that
using std::vector<uint8_t> would make most sense.

So I'd be in favour of changing the default for the C++ binding of
assuming std::string indicates utf8 data and make the user change the
encoding if it is not. I'd suggest also adding std::vector<uint8_t> and
encoding that as binary. And adding the C++11 types too (in ifdefs).

Andrew



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Mime
View raw message