Return-Path: Delivered-To: apmail-hadoop-avro-dev-archive@minotaur.apache.org Received: (qmail 5820 invoked from network); 18 Jun 2009 01:46:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Jun 2009 01:46:18 -0000 Received: (qmail 22639 invoked by uid 500); 18 Jun 2009 01:46:30 -0000 Delivered-To: apmail-hadoop-avro-dev-archive@hadoop.apache.org Received: (qmail 22587 invoked by uid 500); 18 Jun 2009 01:46:30 -0000 Mailing-List: contact avro-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: avro-dev@hadoop.apache.org Delivered-To: mailing list avro-dev@hadoop.apache.org Received: (qmail 22572 invoked by uid 99); 18 Jun 2009 01:46:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jun 2009 01:46:29 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jun 2009 01:46:27 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9E3CC234C044 for ; Wed, 17 Jun 2009 18:46:07 -0700 (PDT) Message-ID: <531023679.1245289567632.JavaMail.jira@brutus> Date: Wed, 17 Jun 2009 18:46:07 -0700 (PDT) From: "Scott Carey (JIRA)" To: avro-dev@hadoop.apache.org Subject: [jira] Commented: (AVRO-36) binary default values do not decode base64 In-Reply-To: <654448859.1243021245595.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/AVRO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721029#action_12721029 ] Scott Carey commented on AVRO-36: --------------------------------- {quote}So, for example, a bytes containing a single zero could be encoded with "\u0000". This inflates non-printing characters in binary data 6x, but is perhaps the most simple, standard encoding we can use.{quote} This seems ambiguous. Code points in strings make sense. Code points representing binary are more confusing. How do you encode a default value of 0xFFFF -- two bytes? No code point encodes to that in binary representation by any defined UTF serialization I know of. If the code points are interpreted as code points, the principle of least astonishment would indicate they encode like them with a character encoding. If they are meant to be interpreted as 'raw' values this would work, but may be confusing. Code points can have intrinsic values much larger than 255 which brings up interesting questions: Do the strings "\uFFFF" and"\u00FF\u00FF" represent the same binary data? Or is the latter 0x00FF00FF ? It can't be the latter since there would be no way of representing one byte. But I'm sure this would confuse some users. It could be a requirement that only code points between \u0000 and \u00FF be used to guarantee that the number of bytes equals the number of characters. I suppose any string representation of default binary values raises the question of what to do with a character with value > 255. URL encoding forbids such characters, as does a hex literal. > binary default values do not decode base64 > ------------------------------------------ > > Key: AVRO-36 > URL: https://issues.apache.org/jira/browse/AVRO-36 > Project: Avro > Issue Type: Bug > Components: java > Reporter: Doug Cutting > Assignee: Doug Cutting > > The specification says that default values for binary data are base64 encoded text, but the Java implementation uses the raw bytes of the textual value, and does not perform base64 decoded as specified. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.