Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of linlma@gmail.com designates
 209.85.212.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr33pEfbtqGymQehw9zN9NQFhatZghNCXVhmjC0=b=sEnQ@mail.gmail.com>
References: 
 <CAK_MoSv+kWzmJuuFY=uM1Vu1wXdwUrv6sn5=Zwicz5Sk3319=g@mail.gmail.com>
	<CAOcnVr33pEfbtqGymQehw9zN9NQFhatZghNCXVhmjC0=b=sEnQ@mail.gmail.com>
Date: Sun, 23 Dec 2012 23:09:06 +0800
Message-ID: 
 <CAK_MoSvUjh7C5byLd4hRK8KZ=no_R8KNH00FkqTXhbu7D6rjAw@mail.gmail.com>
Subject: Re: reducer tasks start time issue
From: Lin Ma <linlma@gmail.com>
To: user@hadoop.apache.org, Harsh J <harsh@cloudera.com>
Content-Type: multipart/alternative; boundary=bcaec50408ec3b7b8204d1867677

--bcaec50408ec3b7b8204d1867677
Content-Type: text/plain; charset=ISO-8859-1

Thanks for answering my question with not only the answer, but also
detailed description. :-)

regards,
Lin

On Sun, Dec 23, 2012 at 12:15 AM, Harsh J <harsh@cloudera.com> wrote:

> A reduce can't process the complete data set until it has fetched all
> partitions. And any map may produce a partition for any reducer.
> Hence, we generally wait before all maps have terminated, and their
> partition outputs ready and copied over to reduces, before we begin to
> group and process the keys.
>
> However, given that you began thinking about this, this paper on
> "Online" Hadoop may interest you:
> http://www.neilconway.org/docs/nsdi2010_hop.pdf
>
> On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <linlma@gmail.com> wrote:
> > Hi guys,
> >
> > Supposing in a Hadoop job, there are both mappers and reducers. My
> question
> > is, reducer tasks cannot begin until all mapper tasks complete? If so,
> why
> > designed in this way?
> >
> > thanks in advance,
> > Lin
>
>
>
> --
> Harsh J
>

--bcaec50408ec3b7b8204d1867677
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks for answering my question with not only the answer, but also detaile=
d description. :-)<br><br>regards,<br>Lin<br><br><div class=3D"gmail_quote"=
>On Sun, Dec 23, 2012 at 12:15 AM, Harsh J <span dir=3D"ltr">&lt;<a href=3D=
"mailto:harsh@cloudera.com" target=3D"_blank">harsh@cloudera.com</a>&gt;</s=
pan> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">A reduce can&#39;t process the complete data=
 set until it has fetched all<br>
partitions. And any map may produce a partition for any reducer.<br>
Hence, we generally wait before all maps have terminated, and their<br>
partition outputs ready and copied over to reduces, before we begin to<br>
group and process the keys.<br>
<br>
However, given that you began thinking about this, this paper on<br>
&quot;Online&quot; Hadoop may interest you:<br>
<a href=3D"http://www.neilconway.org/docs/nsdi2010_hop.pdf" target=3D"_blan=
k">http://www.neilconway.org/docs/nsdi2010_hop.pdf</a><br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma &lt;<a href=3D"mailto:linlma@gmail.=
com">linlma@gmail.com</a>&gt; wrote:<br>
&gt; Hi guys,<br>
&gt;<br>
&gt; Supposing in a Hadoop job, there are both mappers and reducers. My que=
stion<br>
&gt; is, reducer tasks cannot begin until all mapper tasks complete? If so,=
 why<br>
&gt; designed in this way?<br>
&gt;<br>
&gt; thanks in advance,<br>
&gt; Lin<br>
<br>
<br>
<br>
</div></div><span class=3D"HOEnZb"><font color=3D"#888888">--<br>
Harsh J<br>
</font></span></blockquote></div><br>

--bcaec50408ec3b7b8204d1867677--