Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 44928 invoked from network); 7 Jul 2008 07:59:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Jul 2008 07:59:54 -0000 Received: (qmail 55786 invoked by uid 500); 7 Jul 2008 07:59:53 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 55764 invoked by uid 500); 7 Jul 2008 07:59:53 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 55753 invoked by uid 99); 7 Jul 2008 07:59:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Jul 2008 00:59:53 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Jul 2008 07:59:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 1F4EE234C155 for ; Mon, 7 Jul 2008 00:59:32 -0700 (PDT) Message-ID: <664964015.1215417572127.JavaMail.jira@brutus> Date: Mon, 7 Jul 2008 00:59:32 -0700 (PDT) From: "Alejandro Abdelnur (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Updated: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M*/RM*] In-Reply-To: <1608867385.1215416611630.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HADOOP-3702: --------------------------------------- Description: On the same input, we usually need to run multiple Maps one after the other without no Reduce. We also have to run multiple Maps after the Reduce. If all pre-Reduce Maps are chained together and run as a single Map a significant amount of Disk I/O will be avoided. Similarly all post-Reduce Maps can be chained together and run in the Reduce phase after the Reduce. was: On the same input, we usually need to run multiple Maps one after the other without no Reduce. We also have to run multiple Maps after the Reduce. If all pre-Reduce Maps are chained together and run as a single Map a significant amount of Disk I/O will be avoided. Similarly all post-Reduce Maps can be chained together and run in the Reduce phase after the Reduce. This could be done with ChainMapper and ChainReducer classes that would manage the chain of Maps and they would override the OutputCollector to implement the chaining. The Maps and Reduce that are part of the Chain are unware they are executed in a Chain, they receive records via the {{map}} and {{reduce}} methods and do the output via the {{OutputCollector}}. The API would look something like: {code:java} public class ChainMapper implements Mapper { public static void addMapper(JobConf job, Class klass, Properties mapperConf); ... } public class ChainReducer implements Reducer { public static void setReducer(JobConf job, Class klass, Properties reducerConf); public static void addMapper(JobConf job, Class klass, Properties mapperConf); ... } {code} The {{Properties}} configuration passed to the {{Mapper}} and {{Reducer}} when setting them into the chain are injected into a copy of the job's configuration. This allows maps to be configured as usual without being aware that they are in a chain. > add support for chaining Maps in a single Map and after a Reduce [M*/RM*] > ------------------------------------------------------------------------- > > Key: HADOOP-3702 > URL: https://issues.apache.org/jira/browse/HADOOP-3702 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Environment: all > Reporter: Alejandro Abdelnur > Assignee: Alejandro Abdelnur > Priority: Minor > > On the same input, we usually need to run multiple Maps one after the other without no Reduce. We also have to run multiple Maps after the Reduce. > If all pre-Reduce Maps are chained together and run as a single Map a significant amount of Disk I/O will be avoided. > Similarly all post-Reduce Maps can be chained together and run in the Reduce phase after the Reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.