avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (AVRO-50) bidrectional text representation of AVRO data
Date Fri, 26 Jun 2009 22:37:47 GMT

     [ https://issues.apache.org/jira/browse/AVRO-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Doug Cutting updated AVRO-50:
-----------------------------

    Attachment: AVRO-50.patch

Here's a patch that implements this.  It uses Jackson's SAX-like API for both reading and
writing, so should perform well.

I changed the representation of unions from what was discussed above to instead use a JSON
object, pairing the type name with the value, special casing just null.  So, with the union
["null", "string", "long"], a string would be encoded as {"string": "foo"}, a long would be
encoded as {"long": 1}, but a null is written as just null.  In other words, except when the
selected branch is null, enums are always tagged, and a null is never tagged.  Yes, other
cases are unambiguous (e.g., the above example), and we could permit more cases to go untagged,
but it seemed better to optimize this one common case and keep the algorithm simple.

To implement this I had to make substantial changes to the Encoder and Decoder API.

I added new methods to:
  - start/end read/write record methods, to put braces around records
  - read/write record field name, to store field names, elided in binary
  - read/write map key, to store map keys, instead of just using read/write string

I changed the read/write array/map api so that the start methods no longer read/write the
size, but only the (logical) braces.

I also added a schema parameter to the union and enum read/write methods so that names can
be used in JSON instead of numbers.  (The numbers would be unambiguous and more compact, but
not user-friendly, as a text format ought to be.) 

Thiru: this last change is not complete for the resolving and validating encoder/decoder implementations,
since the schema is not available in the parsing table.  Can you please have a look at adding
this?  It would be nice if one could use these to generate JSON output too.  Thanks!

If folks like this, I'll update the spec document to describe it before committing.

> bidrectional text representation of AVRO data
> ---------------------------------------------
>
>                 Key: AVRO-50
>                 URL: https://issues.apache.org/jira/browse/AVRO-50
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Philip Zeyliger
>            Assignee: Doug Cutting
>         Attachments: AVRO-50.patch
>
>
> It would be very useful to add a text representation of AVRO data to the spec, and implement
toString() and fromString() in all implementations.  Faced with binary data, it'll be a useful
operation to decode it for debugging, ad-hoc manipulation, etc.
> I suspect the text format will:
>  * be JSON
>  * require the schema for full interpretation
>  * map easily onto the binary format (if the binary format has a signifier to take a
specific branch of a union, the text format will have such a signifier as well)
>  * not be unique (there's more than one way to encode a given number (e.g., {{0x0 ==
0}}) or string (e.g., {{"\u0061" == "a"}}, not to mention flexible whitespace)
>  * be compatible, for the binary type, with whatever is decided in AVRO-36

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message