avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
Date Mon, 14 Dec 2015 21:54:46 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056779#comment-15056779

Doug Cutting commented on AVRO-1584:

Ryan, I agree this is a bug in the current implementation.  According to section RFC 4627,
control characters must be escaped.
bq. All Unicode characters may be placed within the quotation marks except for the characters
that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
I note that this was fixed for strings in AVRO-713 and we can probably share this logic.

The difference between toString() JSON and Avro's JSON data encoding is longstanding and primarily
around the encoding of unions.  For full read/write fidelity, many union values must be tagged
with their type, so that's what the JSON encoding requires.  The toString() encoding was not
intended for data fidelity but for debugging, so a simpler version was implemented.  (It actually
pre-dates the specification of the JSON encoding.)  It so happens that default values in schemas
do not need to be tagged, so the toString() format is identical to the default-value format.

However there are frequent requests for a reader that accepts such an untagged format, for
interaction with other JSON-generating software.  In retrospect, the JSON encoding should
perhaps not require tagging for unions with null or unions between a primitive and a non-primitive,
i.e., only tag unions when it's required.  We instead opted for simplicity of specification
implementation, to ease interoperability between various Avro implementations, when perhaps
in this case we should have optimized for ease of interoperability with non-Avro producers
and consumers of JSON.

So long-term we might add an encoder/decoder that doesn't handle unions at all or that handles
them more parsimoniously, then perhaps implement default values and toString() using this
encoding.  But I don't think we should alter the currently specified JSON encoding, nor change
the default or toString() format.

> Json output doesn't generate base64 for byte arrays
> ---------------------------------------------------
>                 Key: AVRO-1584
>                 URL: https://issues.apache.org/jira/browse/AVRO-1584
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.7
>         Environment: Pure java.
>            Reporter: Christophe Lorenz
>         Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, AVRO-1584.patch
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [     {"name": "data", "type": "bytes"} ]
> }
> The toString()  
> 	System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back and forth
to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}

This message was sent by Atlassian JIRA

View raw message