Return-Path: Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: (qmail 28770 invoked from network); 24 Nov 2010 23:53:12 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Nov 2010 23:53:12 -0000 Received: (qmail 94857 invoked by uid 500); 24 Nov 2010 23:53:44 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 94722 invoked by uid 500); 24 Nov 2010 23:53:44 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 94714 invoked by uid 99); 24 Nov 2010 23:53:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Nov 2010 23:53:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Nov 2010 23:53:41 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oAONrKXm009631 for ; Wed, 24 Nov 2010 23:53:20 GMT Message-ID: <26403876.298421290642800241.JavaMail.jira@thor> Date: Wed, 24 Nov 2010 18:53:20 -0500 (EST) From: "Ryan Holmes (JIRA)" To: common-issues@hadoop.apache.org Subject: [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935580#action_12935580 ] Ryan Holmes commented on HADOOP-6685: ------------------------------------- bq. Avro is already a dependency. Thrift is already a dependency for HDFS (see HDFS-1484). I'm only adding ProtocolBuffers, which is a commonly used serialization format that many users including me find extremely useful. This line of reasoning is overly general and could be used to support the addition of literally any dependency (i.e. dependency x already exists, so it's OK to add y). Hadoop should focus on providing a pluggable API for serialization rather than providing specific internal implementations (optional implementations would be fine). I also think Hadoop will benefit greatly in the long term by promoting a single, default serialization and file format for new users. I was under the impression that this was a shared goal and that the chosen format was Avro. Adding a direct dependency on Protocol Buffers and increasing the scope of dependency on Thrift seems to directly contradict that goal. bq. In MAPREDUCE-980, you took out the custom JSON parser and replaced it with calls into Avro. Using ProtoBuf is efficient and meant that I wrote 2 lines of code. If I used JSON, I would need to write a parser and printer. Can't you use Jackson, which is already a dependency? > Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: libthrift.jar, serial.patch, serial4.patch, serial6.patch, serial7.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map for the serialization specific configuration. Since this data is really internal to the specific serialization, I think we should change it to be an opaque binary blob. This will simplify the interface for defining specific serializations for different contexts (MAPREDUCE-1462). It will also move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.