Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 5165 invoked from network); 25 Jan 2010 21:35:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Jan 2010 21:35:56 -0000 Received: (qmail 54612 invoked by uid 500); 25 Jan 2010 21:35:56 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 54539 invoked by uid 500); 25 Jan 2010 21:35:56 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 54529 invoked by uid 99); 25 Jan 2010 21:35:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jan 2010 21:35:56 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jan 2010 21:35:55 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3476D29A0036 for ; Mon, 25 Jan 2010 13:35:35 -0800 (PST) Message-ID: <90504507.22251264455335213.JavaMail.jira@brutus.apache.org> Date: Mon, 25 Jan 2010 21:35:35 +0000 (UTC) From: "Owen O'Malley (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator In-Reply-To: <367401715.1256058959495.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804734#action_12804734 ] Owen O'Malley commented on MAPREDUCE-1126: ------------------------------------------ I've very disappointed that this jira went in with a title and description that completely misrepresented the content and scope of the patch. This patch *completely* revamps the type system and semantics of the map/reduce framework. Changing that without a large discussion is uncool. I disagree with the fundamental approach taken here. The details are also problematic, but we need to find an acceptable model before any progress on this or any related patches can be made. My concerns are: 1. We should use the current global serializer factory for *all* contexts of a job. We have 7 serialized types already (map in key, map in value, map out key, map out value, reduce out key, reduce out value, input split). We will likely end up with more types later. Having a separate serializer and metadata for each type will be extremely confusing to the users. 2. Defining the schema should be an Avro specific function and not part of the framework. 3. I don't see any reason to support union types at the top level of the shuffle. There are already libraries that handle this without changing the framework. Furthermore, an Avro record on top of the schema is free in serialization size. 4. Only the default comparator should come from the serializer. The user has to be able to override it in the framework (not change the serialier factory). That said, I think that it is perfectly reasonable for the Avro serializer to accept all types. So if you have a Mapper it will use Avro serialization. > shuffle should use serialization to get comparator > -------------------------------------------------- > > Key: MAPREDUCE-1126 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task > Reporter: Doug Cutting > Assignee: Aaron Kimball > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, MAPREDUCE-1126.patch > > > Currently the key comparator is defined as a Java class. Instead we should use the Serialization API to create key comparators. This would permit, e.g., Avro-based comparators to be used, permitting efficient sorting of complex data types without having to write a RawComparator in Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.