Return-Path: Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: (qmail 55663 invoked from network); 15 Dec 2010 20:14:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Dec 2010 20:14:34 -0000 Received: (qmail 71349 invoked by uid 500); 15 Dec 2010 20:14:34 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 71276 invoked by uid 500); 15 Dec 2010 20:14:34 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 71266 invoked by uid 99); 15 Dec 2010 20:14:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Dec 2010 20:14:34 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Dec 2010 20:14:33 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oBFKEBcB020830 for ; Wed, 15 Dec 2010 20:14:11 GMT Message-ID: <4799860.144461292444051142.JavaMail.jira@thor> Date: Wed, 15 Dec 2010 15:14:11 -0500 (EST) From: "Scott Carey (JIRA)" To: common-issues@hadoop.apache.org Subject: [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971812#action_12971812 ] Scott Carey commented on HADOOP-6685: ------------------------------------- I apologize for replying to something from the conversation from 30 days ago. But this may be useful. {quote}For the second point, Avro is completely unsuitable for that context. For the serializer's metadata, I need to encode a singleton object. With Avro, I would need to encode the schema and then the metadata information. To add insult to injury, the schema will be substantially larger than the data. With ProtocolBuffers, I just encode the data.{quote} This is not true, all configurations could have the same Avro schema. An Avro schema that defines all possibilities is equivalent to tagging fields with type tags. Essentially the schema would be a record with an array of fields, with each field a union of all possible field types. The current Avro API for this use case is clunky, perhaps Avro could make this easier, but you can do dynamic typing and tagged fields in Avro. This means you don't have to serialize the schema, and alleviates the use case here where you just want to encode data and not a schema akin to some PB/Thrift use cases. It adds the overhead of type tags and the objects generated via either Java Reflect or Generic APIs would be cumbersome to use. I would be willing to work on an API for Avro that makes this easier for reading/writing a tagged tuple dynamic data type. > Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: serial.patch, serial4.patch, serial6.patch, serial7.patch, serial9.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map for the serialization specific configuration. Since this data is really internal to the specific serialization, I think we should change it to be an opaque binary blob. This will simplify the interface for defining specific serializations for different contexts (MAPREDUCE-1462). It will also move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.