avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-36) binary default values do not decode base64
Date Thu, 18 Jun 2009 01:46:07 GMT

    [ https://issues.apache.org/jira/browse/AVRO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721029#action_12721029
] 

Scott Carey commented on AVRO-36:
---------------------------------

{quote}So, for example, a bytes containing a single zero could be encoded with "\u0000". This
inflates non-printing characters in binary data 6x, but is perhaps the most simple, standard
encoding we can use.{quote}

This seems ambiguous.  Code points in strings make sense.  Code points representing binary
are more confusing.

How do you encode a default value of 0xFFFF -- two bytes?  No code point encodes to that in
binary representation by any defined UTF serialization I know of.  If the code points are
interpreted as code points, the principle of least astonishment would indicate they encode
like them with a character encoding.  If they are meant to be interpreted as 'raw' values
this would work, but may be confusing.  Code points can have intrinsic values much larger
than 255 which brings up interesting questions:

Do the strings "\uFFFF" and"\u00FF\u00FF" represent the same binary data?  Or is the latter
0x00FF00FF ?  It can't be the latter since there would be no way of representing one byte.
 But I'm sure this would confuse some users.  It could be a requirement that only code points
between \u0000 and \u00FF be used to guarantee that the number of bytes equals the number
of characters.

I suppose any string representation of default binary values raises the question of what to
do with a character with value > 255.  URL encoding forbids such characters, as does a
hex literal.

> binary default values do not decode base64
> ------------------------------------------
>
>                 Key: AVRO-36
>                 URL: https://issues.apache.org/jira/browse/AVRO-36
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>
> The specification says that default values for binary data are base64 encoded text, but
the Java implementation uses the raw bytes of the textual value, and does not perform base64
decoded as specified.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message