Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 98538 invoked from network); 7 Oct 2007 12:26:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Oct 2007 12:26:11 -0000 Received: (qmail 52556 invoked by uid 500); 7 Oct 2007 12:25:58 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 52524 invoked by uid 500); 7 Oct 2007 12:25:58 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 52515 invoked by uid 99); 7 Oct 2007 12:25:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Oct 2007 05:25:58 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Oct 2007 12:25:56 +0000 Received: from arunc.eglbp.corp.yahoo.com (arunc.eglbp.corp.yahoo.com [10.66.74.226]) by mrout1.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id l97COiB1098520 for ; Sun, 7 Oct 2007 05:24:45 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:x-authentication-warning:date:from:to:subject: message-id:references:mime-version:content-type: content-disposition:in-reply-to:user-agent; b=a7H6ai0fMfhSldqRb026037qn5zZsdqMBxaQFjdaz0BeSctV/OTrvJ+GgqW8C3et Received: (from arunc@localhost) by arunc.eglbp.corp.yahoo.com (8.13.3/8.13.3) id l97COi09006910 for hadoop-user@lucene.apache.org; Sun, 7 Oct 2007 17:54:44 +0530 (IST) (envelope-from arunc@yahoo-inc.com) X-Authentication-Warning: arunc.eglbp.corp.yahoo.com: arunc set sender to arunc@yahoo-inc.com using -f Date: Sun, 7 Oct 2007 17:54:44 +0530 From: Arun C Murthy To: hadoop-user@lucene.apache.org Subject: Re: Q: Sending output of reduce to mapper Message-ID: <20071007122443.GA6345@yahoo-inc.com> References: <13079722.post@talk.nabble.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <13079722.post@talk.nabble.com> User-Agent: Mutt/1.4.2.1i X-Virus-Checked: Checked by ClamAV on apache.org Hi Ken, On Sat, Oct 06, 2007 at 08:54:54PM -0700, Ken Pu wrote: > >Hi, > >As a beginner of Hadoop, I wonder how to send output key-value pairs of the >reducers back to the input of mappers for iterative processing. > A map-reduce job has only 1 set of maps and 1 set of reduces. The way to do what you seek would be to chain jobs together i.e. output of job1 becomes input of job2 and so on. That is fairly easy since the output of the job (i.e. reduces) is on hdfs, usually. Clearly the onus on waiting for job-completion is on the user-code i.e. you have to ensure job1 is complete before launching job2 and so on... The way to do that would be: a) http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/JobClient.html#runJob(org.apache.hadoop.mapred.JobConf) which submits the job and returns only after it completes (success or failure). b) http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/JobClient.html#submitJob(org.apache.hadoop.mapred.JobConf) to just submit the job and poll yourself, look at src/java/org/apache/hadoop/mapred/JobClient.java and in particular the implmentation of *runJob* on how to do taht. b) If you don't want to poll use the *job.end.notification.url* property where you can setup a url which will be invoked once the job completes to do async-stuff. (Take a look at src/test/org/apache/hadoop/mapred/NotificationTestCase.java for an e.g. on how to use that). >What's hadoop streaming? Can I pipe the output stream of reducers back to >the input stream of the mappers to achieve what I want? > Hadoop streaming is a utility which allows the user to create and run map/reduce jobs with any executables as the mapper and/or the reducer. E.g. one can use std. unix utilities as the mapper/reducer $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input myInputDirs \ -output myOutputDir \ -mapper /bin/cat \ -reducer /bin/wc Hope that helps. Arun >Any pointer would be greatly appreciated. >-- >View this message in context: http://www.nabble.com/Q%3A-Sending-output-of-reduce-to-mapper-tf4581957.html#a13079722 >Sent from the Hadoop Users mailing list archive at Nabble.com. >