avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1456) AvroAsTextInputFormat is inconsistent with the Avro JSON Encoding described in the Avro Specification
Date Wed, 12 Feb 2014 18:55:21 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899426#comment-13899426
] 

Doug Cutting commented on AVRO-1456:
------------------------------------

I'm not sure that it is a bug for AvroAsTextInputFormat to use the toString() JSON encoding
rather than the Avro encoding.  Generally AvroAsTextInputFormat is used to supply Avro to
non-Avro-aware tools, where folks generally seem to prefer to represent unions as simply different
types in the JSON data.

Perhaps we could include an option to use the Avro JSON encoding here too.  Would that be
of use to you?

> AvroAsTextInputFormat is inconsistent with the Avro JSON Encoding described in the Avro
Specification
> -----------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-1456
>                 URL: https://issues.apache.org/jira/browse/AVRO-1456
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.6
>            Reporter: Jamie Olson
>
> org.apache.avro.mapred.AvroAsTextInputFormat relies on the toString() method rather than
using org.apache.avro.generic.GenericDatumWriter.write() and org.apache.avro.io.JsonEncoder
as in org.apache.avro.tool.DataFileReadTool.  This results in a serialization of the data
element, without the fully qualified name as specified in the Avro Specifications JSON Encoding
section: http://avro.apache.org/docs/1.7.6/spec.html#json_encoding
> The specification indicates that for a union type: ["null","string","Foo"], data should
be serialized with:
> * null as null;
> * the string "a" as {"string": "a"}; and
> * a Foo instance as {"Foo": {...}}, where {...} indicates the JSON encoding of a Foo
instance.
> Instead, AvroAsTextInputFormat is serializing these values as
> * null as null;
> * the string "a" as "a"; and
> * a Foo instance as {...}, where {...} indicates the JSON encoding of a Foo instance.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message