Return-Path: X-Original-To: apmail-avro-dev-archive@www.apache.org Delivered-To: apmail-avro-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6A1DC10DDE for ; Wed, 12 Feb 2014 18:55:29 +0000 (UTC) Received: (qmail 5040 invoked by uid 500); 12 Feb 2014 18:55:25 -0000 Delivered-To: apmail-avro-dev-archive@avro.apache.org Received: (qmail 4840 invoked by uid 500); 12 Feb 2014 18:55:22 -0000 Mailing-List: contact dev-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list dev@avro.apache.org Received: (qmail 4753 invoked by uid 99); 12 Feb 2014 18:55:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Feb 2014 18:55:21 +0000 Date: Wed, 12 Feb 2014 18:55:21 +0000 (UTC) From: "Doug Cutting (JIRA)" To: dev@avro.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AVRO-1456) AvroAsTextInputFormat is inconsistent with the Avro JSON Encoding described in the Avro Specification MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AVRO-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899426#comment-13899426 ] Doug Cutting commented on AVRO-1456: ------------------------------------ I'm not sure that it is a bug for AvroAsTextInputFormat to use the toString() JSON encoding rather than the Avro encoding. Generally AvroAsTextInputFormat is used to supply Avro to non-Avro-aware tools, where folks generally seem to prefer to represent unions as simply different types in the JSON data. Perhaps we could include an option to use the Avro JSON encoding here too. Would that be of use to you? > AvroAsTextInputFormat is inconsistent with the Avro JSON Encoding described in the Avro Specification > ----------------------------------------------------------------------------------------------------- > > Key: AVRO-1456 > URL: https://issues.apache.org/jira/browse/AVRO-1456 > Project: Avro > Issue Type: Bug > Components: java > Affects Versions: 1.7.6 > Reporter: Jamie Olson > > org.apache.avro.mapred.AvroAsTextInputFormat relies on the toString() method rather than using org.apache.avro.generic.GenericDatumWriter.write() and org.apache.avro.io.JsonEncoder as in org.apache.avro.tool.DataFileReadTool. This results in a serialization of the data element, without the fully qualified name as specified in the Avro Specifications JSON Encoding section: http://avro.apache.org/docs/1.7.6/spec.html#json_encoding > The specification indicates that for a union type: ["null","string","Foo"], data should be serialized with: > * null as null; > * the string "a" as {"string": "a"}; and > * a Foo instance as {"Foo": {...}}, where {...} indicates the JSON encoding of a Foo instance. > Instead, AvroAsTextInputFormat is serializing these values as > * null as null; > * the string "a" as "a"; and > * a Foo instance as {...}, where {...} indicates the JSON encoding of a Foo instance. -- This message was sent by Atlassian JIRA (v6.1.5#6160)