avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-36) binary default values do not decode base64
Date Tue, 02 Jun 2009 17:10:07 GMT

    [ https://issues.apache.org/jira/browse/AVRO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715590#action_12715590
] 

Doug Cutting commented on AVRO-36:
----------------------------------

> I like the spec the way it is i.e. length + actual bytes 

The question is not how to encode binary values in Avro, but rather, how to encode default
values for binary fields in JSON-based schemas, which has no support for binary values but
only UTF-8 strings.

It is possible to encode arbitrary binary values in UTF-8, by encoding each byte as a code
point.  The number of bytes encoded will differ than the raw binary, as bytes between 128
and 255 must be encoded as two bytes.  This has the advantage of rendering ASCII portions
of binary data in a readable manner, but, in pathological cases, it can double data size.
 Base64 is more opaque, but guarantees data size at 1.5 times the number of bytes.

For default values I'm not worried about the size, but base64 is a more standard way of encoding
binary values in text than perverting unicode.  In particular, base64 is designed to survive
email and text editors, which makes it easier to process as source code, as schemas will sometimes
be.

Ideally we'd use an encoding that was both text-editor/email friendly and transparent.  URL
encoding might thus be a better choice than base64 or raw UTF-8.  It's also readily available
on most platforms.  How would folks feel about using URL encoding for default values of binary
fields in JSON schemas?


> binary default values do not decode base64
> ------------------------------------------
>
>                 Key: AVRO-36
>                 URL: https://issues.apache.org/jira/browse/AVRO-36
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>
> The specification says that default values for binary data are base64 encoded text, but
the Java implementation uses the raw bytes of the textual value, and does not perform base64
decoded as specified.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message