Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of vamshi2105@gmail.com
 designates 209.85.214.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr0uxEFrGBLC27+05pfJ9WT90b6BcCkd9ze3XyR2Pu3BQg@mail.gmail.com>
References: 
 <CAPVFMeU3mhT61Yuu_GOQdmyyJbLt3z5aLK+-DuT0DYj7hxXQkg@mail.gmail.com>
	<CAOcnVr0uxEFrGBLC27+05pfJ9WT90b6BcCkd9ze3XyR2Pu3BQg@mail.gmail.com>
Date: Thu, 9 Feb 2012 12:15:42 +0530
Message-ID: 
 <CAPVFMeX5Vw8VAW_RMzyTfH+Fe5BumzggGZgOTGB71A9Y6QBVFg@mail.gmail.com>
Subject: Re: job taking input file, which "is being" written by its preceding
 job's map phase
From: Vamshi Krishna <vamshi2105@gmail.com>
To: mapreduce-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=f46d04446c6367b3b904b8825c48

--f46d04446c6367b3b904b8825c48
Content-Type: text/plain; charset=ISO-8859-1

thank you harsh for your reply. Here what chainMapper does is, once the
first mapper finishes, then only second map starts using that file written
by first mapper. Its just like chain. But what i want is like pipelining
i.e after first map starts and before it finishes only second map has to
start and kepp on reading from the same file that is being written by first
map. It is almost like produce-consumer like scenario, where first map
writes in to the file, and second map keeps on reading the same file. So
that pipelining effect is seen between two maps.
Hope you got what i am trying to tell..

please help..

On Wed, Feb 8, 2012 at 12:47 PM, Harsh J <harsh@cloudera.com> wrote:

> Vamsi,
>
> Is it not possible to express your M-M-R phase chain as a simple, single
> M-R?
>
> Perhaps look at the ChainMapper class @
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html
>
> On Wed, Feb 8, 2012 at 12:28 PM, Vamshi Krishna <vamshi2105@gmail.com>
> wrote:
> > Hi all
> > i have an important question about mapreduce.
> >  i have 2 hadoop mapreduce jobs. job1 has only mapper but no reducer.
> Job1
> > started and in its map() it is writing to a "file1" using
> > context(Arg1,Arg2). If i wanted to start job2 (immidietly after job1) ,
> > which should take the "file1" (output still being written by above job's
> map
> > phase) as input and do processing in its own map/reduce phases, and job2
> > should keep on taking the newly written data to "file1" , untill job1
> > finishes, what i should do?
> >
> > how can i do that, Please can anybody help?
> >
> > --
> > Regards
> >
> > Vamshi Krishna
> >
>
>
>
> --
> Harsh J
> Customer Ops. Engineer
> Cloudera | http://tiny.cloudera.com/about
>


-- 
*Regards*
*
Vamshi Krishna
*

--f46d04446c6367b3b904b8825c48
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

thank you harsh for your reply. Here what chainMapper does is, once the fir=
st mapper finishes, then only second map starts using that file written by =
first mapper. Its just like chain. But what i want is like pipelining i.e a=
fter first map starts and before it finishes only second map has to start a=
nd kepp on reading from the same file that is being written by first map. I=
t is almost like produce-consumer like scenario, where first map writes in =
to the file, and second map keeps on reading the same file. So that pipelin=
ing effect is seen between two maps.=A0<div>
Hope you got what i am trying to tell..</div><div><br></div><div>please hel=
p..<br><br><div class=3D"gmail_quote">On Wed, Feb 8, 2012 at 12:47 PM, Hars=
h J <span dir=3D"ltr">&lt;<a href=3D"mailto:harsh@cloudera.com">harsh@cloud=
era.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Vamsi,<br>
<br>
Is it not possible to express your M-M-R phase chain as a simple, single M-=
R?<br>
<br>
Perhaps look at the ChainMapper class @<br>
<a href=3D"http://hadoop.apache.org/common/docs/current/api/org/apache/hado=
op/mapred/lib/ChainMapper.html" target=3D"_blank">http://hadoop.apache.org/=
common/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html</a><b=
r>

<div><div></div><div class=3D"h5"><br>
On Wed, Feb 8, 2012 at 12:28 PM, Vamshi Krishna &lt;<a href=3D"mailto:vamsh=
i2105@gmail.com">vamshi2105@gmail.com</a>&gt; wrote:<br>
&gt; Hi all<br>
&gt; i have an important question about mapreduce.<br>
&gt; =A0i have 2 hadoop mapreduce jobs. job1 has only mapper but no reducer=
. Job1<br>
&gt; started and in its map() it is writing to a &quot;file1&quot; using<br=
>
&gt; context(Arg1,Arg2). If i wanted to start job2 (immidietly after job1) =
,<br>
&gt; which should take the &quot;file1&quot; (output still being written by=
 above job&#39;s map<br>
&gt; phase) as input and do processing in its own map/reduce phases, and jo=
b2<br>
&gt; should keep on taking the newly written data to &quot;file1&quot; , un=
till job1<br>
&gt; finishes, what i should do?<br>
&gt;<br>
&gt; how can i do that, Please can anybody help?<br>
&gt;<br>
&gt; --<br>
&gt; Regards<br>
&gt;<br>
&gt; Vamshi Krishna<br>
&gt;<br>
<br>
<br>
<br>
</div></div><font color=3D"#888888">--<br>
Harsh J<br>
Customer Ops. Engineer<br>
Cloudera | <a href=3D"http://tiny.cloudera.com/about" target=3D"_blank">htt=
p://tiny.cloudera.com/about</a><br>
</font></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><b>Re=
gards</b><br><b><br>Vamshi Krishna<br></b><br>
</div>

--f46d04446c6367b3b904b8825c48--