qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gordon Sim <g...@redhat.com>
Subject Re: UTF8 / binary strings in dynamic languages
Date Thu, 22 Aug 2013 14:54:50 GMT
On 08/22/2013 03:36 PM, Justin Ross wrote:
> On Thu, Aug 22, 2013 at 8:41 AM, Gordon Sim <gsim@redhat.com> wrote:
>> On 08/21/2013 10:43 PM, Justin Ross wrote:
>>> "If I put a binary value in a map and encoded it some of the time it
>>> might be valid utf8, other times not."  This shouldn't be allowed to
>>> happen, IMO.  You meant it to be a binary value--we have to find a way
>>> to capture and preserve that information.
>> I believe the point was that for an application sending binary data via the
>> ambiguous string type (between two processes in languages that have such a
>> type), if that was encoded on the wire as str16 (i.e. utf8) it could lead to
>> subtle bugs.
>> Testing could work until the actual binary payload was changed in some way
>> such that it was not valid utf8.
> Right.  I'm saying that sucks, so don't do that.  For instance, we
> could ask our users to use a 'Data' class to input arbitrary bytes,
> and otherwise treat ambiguous strings as textual.

The point is that it is easy for people to miss that. Just as it is easy 
for them to miss the fact that you should choose the explicit utf8 type 
for textual data.

An explicit type is always preferable. The question is how to handle an 
ambiguous type. If encoded as a str16 then it may work in some case and 
fail in others; i.e. a subtle bug that testing may not catch depending 
on payloads actually tested. By contrast if it is encoded as a vbin the 
behaviour - even though admittedly unexpected for many - will at least 
be the same each time you try it independent of the actual contents of 
the string.

To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

View raw message