Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44537270E for ; Thu, 5 May 2011 14:53:43 +0000 (UTC) Received: (qmail 61509 invoked by uid 500); 5 May 2011 14:53:43 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 61473 invoked by uid 500); 5 May 2011 14:53:43 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 61465 invoked by uid 99); 5 May 2011 14:53:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 14:53:43 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 14:53:42 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B24A3C2260 for ; Thu, 5 May 2011 14:53:03 +0000 (UTC) Date: Thu, 5 May 2011 14:53:03 +0000 (UTC) From: "Mariappan Asokan (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <241445116.24680.1304607183726.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029349#comment-13029349 ] Mariappan Asokan commented on MAPREDUCE-2454: --------------------------------------------- Hi Steve, Thank you very much for your comments. I will try to make the sorting done on Map and Reduce side as pluggable. The default implementation will be whatever is available in the framework. It is easy to separate the sorting process on the Map side(currently all the code is in the class MapOutputBuffer which lives in MapTask.java.) It is very hard to separate the merge on the Reduce side because of the way it is coded. I am working to separate that as well. Regarding GNU sort plugin, I am making the external sort command name configurable. It can be POSIX sort command as well. Since most Hadoop installations are Linux based, GNU sort is available as the POSIX sort implementation. Other UNIX installations can use the POSIX sort command as an external sorter. There is no GPL issue. Perhaps, I can remove the word GNU and just call it UNIX. Regarding class loader related exceptions: I will look at framework's code and see what it does when it loads a Mapper or Reducer class and follow the same since the scenario is very similar. All issues you have raised w.r.t class loading are applicable there as well. An explanation on UnsupportedOperationException: If the external sorter uses a UNIX command like sort, it may not be able to handle a custom key type user has defined since the key comparator may be written in Java. In such a case there will be message logged in syslog and the framework's sorter will be used. I think this is fair enough. Please let me know if you think otherwise. When I am done with the implementation(on top of MAPREDUCE-279) and testing, I will post a patch file for review. Would you be interested to work with me as a committer? Thank you. > Allow external sorter plugin for MR > ----------------------------------- > > Key: MAPREDUCE-2454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Reporter: Mariappan Asokan > Priority: Minor > Attachments: KeyValueIterator.java, MapOutputSorter.java, MapOutputSorterAbstract.java, ReduceInputSorter.java > > > Define interfaces and some abstract classes in the Hadoop framework to facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira