Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3E948E949 for ; Sun, 23 Dec 2012 15:09:39 +0000 (UTC) Received: (qmail 70595 invoked by uid 500); 23 Dec 2012 15:09:34 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 70380 invoked by uid 500); 23 Dec 2012 15:09:32 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 70355 invoked by uid 99); 23 Dec 2012 15:09:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Dec 2012 15:09:31 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URIBL_RHS_DOB X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of linlma@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vb0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Dec 2012 15:09:27 +0000 Received: by mail-vb0-f44.google.com with SMTP id fc26so6919561vbb.3 for ; Sun, 23 Dec 2012 07:09:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=qWZvDAkBxP05lSsjygDtJcI62afjtosjb49o07y90V8=; b=If/buViAVId3Hd1M5FzAUR9EDHJjliYebgBD47033TpCbGkz5+YkVffVbAKB+QZF4n lYLRyHqsDvIs2+NMmwtJPsxV19tIIeXA1pCUoFzzwh5Zx0Q02dhJKpOE4+oRCxbGvZXH nvjmsF2eoxkcqz9rPc4uLYJQsI+0AFbG7akZrzOuZW9aSjOQSWy7YxTqK3mjwpZiuDE+ /3itbiwi+Q42Gwd6d9ofvYWaBvEGvNSrIIwRb0kRTPZXxA1rT9AKWGvYRW9Qd/MorT9i qMm3/jZaDHKT/mjlRhPILHbtktC8k+rACDvnOb+T7QG1QXCIaL6Z5cAn2TLO0kzOjHIb Q4Iw== MIME-Version: 1.0 Received: by 10.52.17.168 with SMTP id p8mr24970520vdd.126.1356275346275; Sun, 23 Dec 2012 07:09:06 -0800 (PST) Received: by 10.58.68.135 with HTTP; Sun, 23 Dec 2012 07:09:06 -0800 (PST) In-Reply-To: References: Date: Sun, 23 Dec 2012 23:09:06 +0800 Message-ID: Subject: Re: reducer tasks start time issue From: Lin Ma To: user@hadoop.apache.org, Harsh J Content-Type: multipart/alternative; boundary=bcaec50408ec3b7b8204d1867677 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec50408ec3b7b8204d1867677 Content-Type: text/plain; charset=ISO-8859-1 Thanks for answering my question with not only the answer, but also detailed description. :-) regards, Lin On Sun, Dec 23, 2012 at 12:15 AM, Harsh J wrote: > A reduce can't process the complete data set until it has fetched all > partitions. And any map may produce a partition for any reducer. > Hence, we generally wait before all maps have terminated, and their > partition outputs ready and copied over to reduces, before we begin to > group and process the keys. > > However, given that you began thinking about this, this paper on > "Online" Hadoop may interest you: > http://www.neilconway.org/docs/nsdi2010_hop.pdf > > On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma wrote: > > Hi guys, > > > > Supposing in a Hadoop job, there are both mappers and reducers. My > question > > is, reducer tasks cannot begin until all mapper tasks complete? If so, > why > > designed in this way? > > > > thanks in advance, > > Lin > > > > -- > Harsh J > --bcaec50408ec3b7b8204d1867677 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks for answering my question with not only the answer, but also detaile= d description. :-)

regards,
Lin

On Sun, Dec 23, 2012 at 12:15 AM, Harsh J <harsh@cloudera.com> wrote:
A reduce can't process the complete data= set until it has fetched all
partitions. And any map may produce a partition for any reducer.
Hence, we generally wait before all maps have terminated, and their
partition outputs ready and copied over to reduces, before we begin to
group and process the keys.

However, given that you began thinking about this, this paper on
"Online" Hadoop may interest you:
http://www.neilconway.org/docs/nsdi2010_hop.pdf

On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <linlma@gmail.com> wrote:
> Hi guys,
>
> Supposing in a Hadoop job, there are both mappers and reducers. My que= stion
> is, reducer tasks cannot begin until all mapper tasks complete? If so,= why
> designed in this way?
>
> thanks in advance,
> Lin



--
Harsh J

--bcaec50408ec3b7b8204d1867677--