Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 19980 invoked from network); 4 Aug 2008 15:19:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Aug 2008 15:19:09 -0000 Received: (qmail 58279 invoked by uid 500); 4 Aug 2008 15:19:06 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 58235 invoked by uid 500); 4 Aug 2008 15:19:06 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 58220 invoked by uid 99); 4 Aug 2008 15:19:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Aug 2008 08:19:05 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Aug 2008 15:18:19 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 25505234C190 for ; Mon, 4 Aug 2008 08:18:46 -0700 (PDT) Message-ID: <252243369.1217863126151.JavaMail.jira@brutus> Date: Mon, 4 Aug 2008 08:18:46 -0700 (PDT) From: "Alejandro Abdelnur (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M*/RM*] In-Reply-To: <1608867385.1215416611630.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619541#action_12619541 ] Alejandro Abdelnur commented on HADOOP-3702: -------------------------------------------- *On #1*, the ambiguity for the existing {write(OutputStream)}} and the {{write(DataOutput)}} introduced by the {{Writable}} interface arises when a {{DataOutputStream}} object is passed to the {{write(...)}} method, the compiler cannot resolve which one to use. (I've done this in one of the previous patches, my solution was changing the existing {{write()}} to {{writeXml()}} but it started breaking in different places in contrib as the method is used outside of core. *On #2*, it doesn't make much sense to generify the {{Chain*}} classes as they are not directly exposed to the M/R developer, they are artifacts used to enable chaining, the M/R developer doesn't code against them. If they would be generified it wold not add any value as at compile time nothing could be checked for them. *On #3*, {{ChainOutputCollector}} is lightweight class, see my answer to Chris' regarding a similar question. > add support for chaining Maps in a single Map and after a Reduce [M*/RM*] > ------------------------------------------------------------------------- > > Key: HADOOP-3702 > URL: https://issues.apache.org/jira/browse/HADOOP-3702 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Environment: all > Reporter: Alejandro Abdelnur > Assignee: Alejandro Abdelnur > Attachments: patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt > > > On the same input, we usually need to run multiple Maps one after the other without no Reduce. We also have to run multiple Maps after the Reduce. > If all pre-Reduce Maps are chained together and run as a single Map a significant amount of Disk I/O will be avoided. > Similarly all post-Reduce Maps can be chained together and run in the Reduce phase after the Reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.