qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gordon Sim <g...@redhat.com>
Subject Re: JMS client skips string message properties
Date Thu, 22 Sep 2011 12:33:21 GMT
On 09/21/2011 07:47 PM, Jiri Krutil wrote:
> In my opinion it is not so obvious, because as far as I know:
> - AMQP allows UTF-8 or UTF-16 strings.
> - Many C++ applications supporting Unicode store strings in std::wstring
> with UCS-2 encoding. Having fixed character size of 2 bytes per code point
> allows for simple and efficient string manipulations. If required,
> conversions to/from UTF-8 are performed on interfaces to the outside world.
> (BTW I think this is also the case for Java.)
> - In C++ it is fairly common to use std::string as a container for binary
> data. I would not say it is wrong to do that.

I agree with all your points here.

> I personally would say that in C++ there is no "default" character encoding.
> Defaulting to UTF-8 makes some sense because all 7-bit ASCII strings are
> UTF-8. But it may be dangerous to assume UTF-8 for all strings and it would
> be probably be safer to somehow force the C++ programs to explicitly specify
> the encoding when reading and writing strings.

Again I agree in general, but what about making assumptions in specific 
contexts? E.g. in Message::setProperty(), what if we documented that 
passing in a std::string as the second parameter is only valid if it 
contains utf8 encoded character data? Any other encoding would then need 
to be more explicit.

The most likely source of error here is where the data is binary (e.g. a 
digest or signature for the message), or where it is extended ASCII. If 
it is some other unicode encoding (e.g. utf16) then I think it would be 
reasonable to expect that to be explicitly noted.

> In Java, the default encoding is apparently UTF-8, but the Java client
> should still be able to accept strings encoded in UTF-16.
> I think that the Qpid client libraries should support implicit conversions
> between UTF-8 and UTF-16/UCS-2. I believe it is acceptable to support only
> the UCS-2 character set (the Unicode's Basic Multilingual Plane) in C++
> client.

So add in support for wstring and convert as necessary? I think that 
would be a good thing to do regardless. As you say, where unicode is 
used in earnest, wstring is the more obvious choice.

Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org

View raw message