avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-50) bidrectional text representation of AVRO data
Date Wed, 17 Jun 2009 22:12:07 GMT

    [ https://issues.apache.org/jira/browse/AVRO-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720938#action_12720938

Doug Cutting commented on AVRO-50:

If we have the schema in hand, then we can use the same JSON representations that are used
for default values (http://hadoop.apache.org/avro/docs/current/spec.html#Records), with the
exception of unions.

Unions will need to be labeled.  For example, a union with a map and a record would be ambiguous,
since both are represented as JSON objects, and unions of bytes and bytes, enums and or strings
would also be ambiguous.

For unions we can use a two-element JSON array whose first element is the name of the type
in the union and whose second element is the JSON-encoded value.  For example, given the schema
"[string|bytes]", the value ["string","foo"] would be a string, while ["bytes","foo"] would
be bytes.  If a value is not a JSON array then it is assumed to be of the first element of
the union.  So, with the above schema, the value "foo" alone would be a string.  Note however
that if a union includes an array, then JSON values must always use the array form.  Thoughts?

> bidrectional text representation of AVRO data
> ---------------------------------------------
>                 Key: AVRO-50
>                 URL: https://issues.apache.org/jira/browse/AVRO-50
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Philip Zeyliger
> It would be very useful to add a text representation of AVRO data to the spec, and implement
toString() and fromString() in all implementations.  Faced with binary data, it'll be a useful
operation to decode it for debugging, ad-hoc manipulation, etc.
> I suspect the text format will:
>  * be JSON
>  * require the schema for full interpretation
>  * map easily onto the binary format (if the binary format has a signifier to take a
specific branch of a union, the text format will have such a signifier as well)
>  * not be unique (there's more than one way to encode a given number (e.g., {{0x0 ==
0}}) or string (e.g., {{"\u0061" == "a"}}, not to mention flexible whitespace)
>  * be compatible, for the binary type, with whatever is decided in AVRO-36

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message