Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 24650 invoked from network); 5 Feb 2010 17:51:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Feb 2010 17:51:52 -0000 Received: (qmail 95655 invoked by uid 500); 5 Feb 2010 17:51:52 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 95594 invoked by uid 500); 5 Feb 2010 17:51:52 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 95580 invoked by uid 99); 5 Feb 2010 17:51:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Feb 2010 17:51:52 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Feb 2010 17:51:49 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 98BF829A0030 for ; Fri, 5 Feb 2010 09:51:28 -0800 (PST) Message-ID: <849363389.73301265392288624.JavaMail.jira@brutus.apache.org> Date: Fri, 5 Feb 2010 17:51:28 +0000 (UTC) From: "Tom White (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-1462) Enable context-specific and stateful serializers in MapReduce In-Reply-To: <132551297.62031265355507926.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830196#action_12830196 ] Tom White commented on MAPREDUCE-1462: -------------------------------------- Owen, thanks for posting your design. I've reproduced my comments on the design which I made on MAPREDUCE-1126 here for convenience: * The changes to the serialization API are not backwards compatible, so a new package of serializer types would need creating. Is this really necessary to achieve Avro integration? * I'm not sure why we need to serialize serializations. The patch in MAPREDUCE-1126 avoids the need for this by using a simple string mechanism for configuration. Having an opaque binary format also makes it difficult to retrieve and use the serialization from other languages (e.g. C++ or other Pipes languages). My latest patch on MAPREDUCE-1126 is language-neutral in this regard. * Adding a side file for the context-serializer mapping complicates the implementation. It's not clear what container file would be used for the side file (Avro container, custom?). I understand that putting framework configuration in the job configuration may not be desirable, but it has been done in the past so I don't know why it is being ruled out here. I would rather have a separate effort (and discussion) to create a "private" job configuration (not accessible by user code) for such configuration (above and beyond the configuration needed for serialization). * The user API is no shorter than the one proposed in MAPREDUCE-1126. Compare: {code} Schema keySchema = ... AvroGenericSerialization serialization = new AvroGenericSerialization(); serialization.setSchema(keySchema); job.set(SerializationContext.MAP_OUTPUT_KEY, serialization); {code} with {code} Schema keySchema = ... AvroGenericData.setMapOutputKeySchema(job, keySchema); {code} > Enable context-specific and stateful serializers in MapReduce > ------------------------------------------------------------- > > Key: MAPREDUCE-1462 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1462 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: task > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Attachments: h-1462.patch > > > Although the current serializer framework is powerful, within the context of a job it is limited to picking a single serializer for a given class. Additionally, Avro generic serialization can make use of additional configuration/state such as the schema. (Most other serialization frameworks including Writable, Jute/Record IO, Thrift, Avro Specific, and Protocol Buffers only need the object's class name to deserialize the object.) > With the goal of keeping the easy things easy and maintaining backwards compatibility, we should be able to allow applications to use context specific (eg. map output key) serializers in addition to the current type based ones that handle the majority of the cases. Furthermore, we should be able to support serializer specific configuration/metadata in a type safe manor without cluttering up the base API with a lot of new methods that will confuse new users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.