Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A14C92D8 for ; Thu, 9 Feb 2012 06:46:41 +0000 (UTC) Received: (qmail 31675 invoked by uid 500); 9 Feb 2012 06:46:39 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 30780 invoked by uid 500); 9 Feb 2012 06:46:10 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 30765 invoked by uid 99); 9 Feb 2012 06:46:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2012 06:46:08 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vamshi2105@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-tul01m020-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2012 06:46:02 +0000 Received: by obbwd18 with SMTP id wd18so2775020obb.35 for ; Wed, 08 Feb 2012 22:45:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=b+fKuIUZo7Us3Jmj6eRd8MSic7nf+0hLFXt/0bju340=; b=TorELGmOyzKtUTitLBsYtv3FKrknA4kQgpG3jri54bP8y8Y/PKMK69Nu3pIdjo3HJP QHjpsXffl1pyorT4r1SBX2jueZRIuJSZTFtPdhxQvy5nYSqq/g0BWcUGF1OKbcfm5Awg X5Rc5olKjoFOvymFmTJ1+8gnduDDp4fsXP+Q0= MIME-Version: 1.0 Received: by 10.182.150.106 with SMTP id uh10mr457633obb.67.1328769942401; Wed, 08 Feb 2012 22:45:42 -0800 (PST) Received: by 10.60.50.132 with HTTP; Wed, 8 Feb 2012 22:45:42 -0800 (PST) In-Reply-To: References: Date: Thu, 9 Feb 2012 12:15:42 +0530 Message-ID: Subject: Re: job taking input file, which "is being" written by its preceding job's map phase From: Vamshi Krishna To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d04446c6367b3b904b8825c48 --f46d04446c6367b3b904b8825c48 Content-Type: text/plain; charset=ISO-8859-1 thank you harsh for your reply. Here what chainMapper does is, once the first mapper finishes, then only second map starts using that file written by first mapper. Its just like chain. But what i want is like pipelining i.e after first map starts and before it finishes only second map has to start and kepp on reading from the same file that is being written by first map. It is almost like produce-consumer like scenario, where first map writes in to the file, and second map keeps on reading the same file. So that pipelining effect is seen between two maps. Hope you got what i am trying to tell.. please help.. On Wed, Feb 8, 2012 at 12:47 PM, Harsh J wrote: > Vamsi, > > Is it not possible to express your M-M-R phase chain as a simple, single > M-R? > > Perhaps look at the ChainMapper class @ > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html > > On Wed, Feb 8, 2012 at 12:28 PM, Vamshi Krishna > wrote: > > Hi all > > i have an important question about mapreduce. > > i have 2 hadoop mapreduce jobs. job1 has only mapper but no reducer. > Job1 > > started and in its map() it is writing to a "file1" using > > context(Arg1,Arg2). If i wanted to start job2 (immidietly after job1) , > > which should take the "file1" (output still being written by above job's > map > > phase) as input and do processing in its own map/reduce phases, and job2 > > should keep on taking the newly written data to "file1" , untill job1 > > finishes, what i should do? > > > > how can i do that, Please can anybody help? > > > > -- > > Regards > > > > Vamshi Krishna > > > > > > -- > Harsh J > Customer Ops. Engineer > Cloudera | http://tiny.cloudera.com/about > -- *Regards* * Vamshi Krishna * --f46d04446c6367b3b904b8825c48 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable thank you harsh for your reply. Here what chainMapper does is, once the fir= st mapper finishes, then only second map starts using that file written by = first mapper. Its just like chain. But what i want is like pipelining i.e a= fter first map starts and before it finishes only second map has to start a= nd kepp on reading from the same file that is being written by first map. I= t is almost like produce-consumer like scenario, where first map writes in = to the file, and second map keeps on reading the same file. So that pipelin= ing effect is seen between two maps.=A0
Hope you got what i am trying to tell..

please hel= p..

On Wed, Feb 8, 2012 at 12:47 PM, Hars= h J <harsh@cloud= era.com> wrote:
Vamsi,

Is it not possible to express your M-M-R phase chain as a simple, single M-= R?

Perhaps look at the ChainMapper class @
http://hadoop.apache.org/= common/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html

On Wed, Feb 8, 2012 at 12:28 PM, Vamshi Krishna <vamshi2105@gmail.com> wrote:
> Hi all
> i have an important question about mapreduce.
> =A0i have 2 hadoop mapreduce jobs. job1 has only mapper but no reducer= . Job1
> started and in its map() it is writing to a "file1" using > context(Arg1,Arg2). If i wanted to start job2 (immidietly after job1) = ,
> which should take the "file1" (output still being written by= above job's map
> phase) as input and do processing in its own map/reduce phases, and jo= b2
> should keep on taking the newly written data to "file1" , un= till job1
> finishes, what i should do?
>
> how can i do that, Please can anybody help?
>
> --
> Regards
>
> Vamshi Krishna
>



--
Harsh J
Customer Ops. Engineer
Cloudera | htt= p://tiny.cloudera.com/about



--
Re= gards

Vamshi Krishna

--f46d04446c6367b3b904b8825c48--