Return-Path: X-Original-To: apmail-qpid-dev-archive@www.apache.org Delivered-To: apmail-qpid-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D3B55F3E6 for ; Wed, 21 Aug 2013 15:50:16 +0000 (UTC) Received: (qmail 80583 invoked by uid 500); 21 Aug 2013 15:50:16 -0000 Delivered-To: apmail-qpid-dev-archive@qpid.apache.org Received: (qmail 80367 invoked by uid 500); 21 Aug 2013 15:50:15 -0000 Mailing-List: contact dev-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@qpid.apache.org Delivered-To: mailing list dev@qpid.apache.org Received: (qmail 80344 invoked by uid 99); 21 Aug 2013 15:50:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 15:50:13 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of prvs=1945baea6b=rhs@alum.mit.edu designates 18.7.68.15 as permitted sender) Received: from [18.7.68.15] (HELO alum-mailsec-scanner-4.mit.edu) (18.7.68.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 15:50:08 +0000 X-AuditID: 1207440f-b7f786d000001f20-e4-5214e19b7be5 Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) by alum-mailsec-scanner-4.mit.edu (Symantec Messaging Gateway) with SMTP id 84.20.07968.B91E4125; Wed, 21 Aug 2013 11:49:47 -0400 (EDT) Received: from mail-pd0-f180.google.com (mail-pd0-f180.google.com [209.85.192.180]) (authenticated bits=0) (User authenticated as rhs@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id r7LFnjqp030139 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT); Wed, 21 Aug 2013 11:49:47 -0400 Received: by mail-pd0-f180.google.com with SMTP id y10so583189pdj.25 for ; Wed, 21 Aug 2013 08:49:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=5NLKon7OPNSivOgqUQfDPiZ5KVoW+OajIHGREriUad8=; b=YkSphacK/UwoLAcOnk/keqZu244Y9a1+Ii9MeAUOuLDvJuSbgsfnxGt2QirYRFe7lG Jx+pMqOljSB7aC+GzwF/r4C5XmT5h8Ow049tVjTmZUqmzjxYgwqga1wk8KC5wCOXmSVm tX0rjKnYCfH89iHN2nV53QVveXpauNbVz6ra2+tdNzw/oHnqZ/DuF0mHFv4MlvLZ//Lu Wp6AbBBzoPLAeqyAeyiRiKudlV2RCq+Ukgc2seYRx4FYYa9wsN9qSPsf8+LDj7S8z0ws dfxRRkU2LjBP9FrgUOwMawh8KVqAE/7Bh8VaSAXS0S2cKmhGAAYcpZCbRcftxX3XUqj0 uNhA== MIME-Version: 1.0 X-Received: by 10.68.204.165 with SMTP id kz5mr376381pbc.159.1377100185430; Wed, 21 Aug 2013 08:49:45 -0700 (PDT) Received: by 10.70.129.37 with HTTP; Wed, 21 Aug 2013 08:49:45 -0700 (PDT) In-Reply-To: References: <20130821124410.256210@gmx.com> Date: Wed, 21 Aug 2013 11:49:45 -0400 Message-ID: Subject: Re: UTF8 / binary strings in dynamic languages From: Rafael Schloming To: dev@qpid.apache.org Cc: "users@qpid.apache.org" Content-Type: multipart/alternative; boundary=047d7b2e4d705f753304e4771f16 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrNKsWRmVeSWpSXmKPExsUixO6iqDv7oUiQwe1D2hZHP29isji74j+j A5PH1DsP2AIYo7htkhJLyoIz0/P07RK4M/4dTS844FLRN38tSwPjD4suRk4OCQETibbXG9gh bDGJC/fWs3UxcnEICVxmlPh39wcTSEJI4AiTROujWIhEH6PE3scL2EEcCYGFrBJdPY/YQKp4 BQQlTs58wgIxqlhices8ZohuL4lJX/YC2RwcnAKBErff2EKEiyReXHgK1soioCpx4ul/Vogx ARIn73aAxYUFzCWm974Bu45NQFNi2+WNYHERAXGJyU3zGUFsZgFjiek/drBA2F4Sb36dYZ7A KDQLyUWzkKQgbB2Jd30PoGxtiVW9Z5lg7GULXzMvYGRdxSiXmFOaq5ubmJlTnJqsW5ycmJeX WqRropebWaKXmlK6iRESAfw7GLvWyxxiFOBgVOLhvbBTJEiINbGsuDL3EKMkB5OSKG/zfaAQ X1J+SmVGYnFGfFFpTmrxIUYJDmYlEd7Ck0A53pTEyqrUonyYlDQHi5I4r/oSdT8hgfTEktTs 1NSC1CKYrAwHh5IEr8sDoEbBotT01Iq0zJwShDQTByfIcC4pkeLUvJTUosTSkox4UCKILwam ApAUD9BeRZB23uKCxFygKETrKUZLjj8r535i5GhZCyK7ls7/xCjEkpeflyolzusO0iAA0pBR mge3DpYGXzGKA30vzOsPUsUDTKFwU18BLWQCWjhbQwhkYUkiQkqqgZFtkdbECSdMPrpl/r/m sVEkaM6RWp8v+Raf15WsXP71xKOPQXw5dosfHm2f43Yxm+dj3HdznanvotcZMCqucz731GJ/ ZzD3Lg+G2C8ZKsuvS5pxbFDzDbVoPKevvrTjrOR5wR+79c+dNizf8PL3+qb/ny9tn3NMwebj TV2GTyst2mX7NY/8W7JSiaU4I9FQi7moOBEAXCvzTV4DAAA= X-Virus-Checked: Checked by ClamAV on apache.org --047d7b2e4d705f753304e4771f16 Content-Type: text/plain; charset=ISO-8859-1 I think for python at least if we were to treat ambiguous string values as text rather than data, we would be at odds with the python community's 2->3 migration plan. The following thread has a useful discussion of this that is worth a careful read: http://stackoverflow.com/questions/1736228/python-data-vs-text/1736279#1736279 --Rafael On Wed, Aug 21, 2013 at 11:31 AM, Justin Ross wrote: > Jimmy, thanks for getting this started. I'd love your feedback to > help sort this out. > > I think these are the cases: > > 1. If the language string is unambiguously textual, send it as amqp str16 > 2. If the language string is unambiguously arbitrary bytes, send it as > amqp vbin > > These are easy. We can tell the user's intention, and we can do the > right thing. > > 3. If the language string is an overloaded text/bytes type, as is > regrettably quite common, what do we do then? > > The current answer to this question is "send it as vbin". That's very > safe, insofar as it won't throw any sort of encoding exception. It > does not, however, always honor what I think is the user's more > typical intention: produce an ascii string at the other end. > > So for 3, I'd like to consider the possibility of, by default, sending > ambiguous language strings as ascii rendered to amqp str16. This > requires an encoding step that may produce errors. And maybe that's > just too obnoxious! That's what I'd like to know. > > In summary, if we have a way to determine what the user wanted (text > or bytes), we should try to carry that through on the wire. At the > following URL I've tried to map out what type information we can get > for each language. Please update it as you please. > > > https://cwiki.apache.org/confluence/display/qpid/Language+support+for+unambiguous+text+string+and+byte+array+types > > On Wed, Aug 21, 2013 at 8:44 AM, Jimmy Jones > wrote: > >> > AFAIK in perl, if you include unicode characters in a string it'll > >> > set the utf8 flag. If you don't include any unicode characters (eg. 7 > >> > bit ascii, or raw bytes) the flag won't be set. So given a perl > >> > scalar that doesn't contain any utf8 characters, you don't know if > >> > its a textual string (str16) or a binary string (vbin). There is a > >> > is_utf8_string function, but that'll only tell you if the string > >> > would be valid utf8, but it could be a binary string that happens to > >> > be valid utf8, so that's not really safe. > >> > >> You can explicitly mark it as utf8 using utf8::upgrade() though, right? > >> Certainly I tried that in a simple test and the property in question was > >> then sent as str16. > > > > Yes, if I as a user had a string that was textual, I could call > utf8::upgrade() to ensure it got sent as str16. I guess this is similar in > concept to calling setEncoding in C++, although maybe less natural in a > dynamically typed language. > > It would be more reasonable to treat perl scalars as textual for our > API if perl offered a good way to explicitly handle byte arrays. My > (certainly insufficient) web browsing suggested that wasn't really > available, or not in a form recommended for use. Any candidates for a > serviceable explicitly-arbitrary-bytes-and-not-text-at-all "type" in > perl? > > Thanks! > Justin > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org > For additional commands, e-mail: dev-help@qpid.apache.org > > --047d7b2e4d705f753304e4771f16--