Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C12310542 for ; Fri, 12 Apr 2013 04:19:08 +0000 (UTC) Received: (qmail 12353 invoked by uid 500); 12 Apr 2013 04:19:03 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 12193 invoked by uid 500); 12 Apr 2013 04:19:02 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 12175 invoked by uid 99); 12 Apr 2013 04:19:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Apr 2013 04:19:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [87.230.46.220] (HELO vwp3725.webpack.hosteurope.de) (87.230.46.220) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Apr 2013 04:18:56 +0000 Received: from ip-64-134-235-131.public.wayport.net ([64.134.235.131] helo=[192.168.6.169]); authenticated by vwp3725.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) id 1UQVRb-0000I0-1O; Fri, 12 Apr 2013 06:18:35 +0200 From: Kai Voigt Content-Type: multipart/alternative; boundary="Apple-Mail=_5594C697-D1E1-4391-A1C9-B278D2B3FFD2" Message-Id: <52A7BF9F-CA7F-44FC-8EA0-6E45A3BF1A3F@123.org> Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: Reduce starts before map completes (at 23%) Date: Thu, 11 Apr 2013 21:18:31 -0700 References: <1364377874.13753.YahooMailNeo@web194703.mail.sg3.yahoo.com> <1364577771.12724.YahooMailNeo@web194704.mail.sg3.yahoo.com> <1364719534.91394.YahooMailNeo@web194703.mail.sg3.yahoo.com> <1365042870.89547.YahooMailNeo@web194702.mail.sg3.yahoo.com> <1365740112.75877.YahooMailNeo@web190702.mail.sg3.yahoo.com> To: user@hadoop.apache.org, Sai Sai In-Reply-To: <1365740112.75877.YahooMailNeo@web190702.mail.sg3.yahoo.com> X-Mailer: Apple Mail (2.1503) X-bounce-key: webpack.hosteurope.de;k@123.org;1365740336;4af4513b; X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_5594C697-D1E1-4391-A1C9-B278D2B3FFD2 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 It's the reduce JVMs that get started while the map phase is still = active. After the first blocks have been processed by the mappers, that = output gets pulled by the reduce JVMs while the mappers work on the next = blocks. But the reduce() code won't be called until all the map output has been = sent to the reduce nodes. More accurately, the first 33% of the reduce percentage is this copy = phase. Kai Am 11.04.2013 um 21:15 schrieb Sai Sai : > I am running the wordcount from hadoop-examples, i am giving as input = a bunch of test files, i have noticed in the output given below reduce = starts when the map is at 23%, i was wondering if it is not right that = reducers will start only after the complete mapping is done which mean = when map is 100% then i thought the reducers will start. Why r the = reducers starting when map is still at 23%. >=20 > 13/04/11 21:10:32 INFO mapred.JobClient: map 0% reduce 0% > 13/04/11 21:10:56 INFO mapred.JobClient: map 1% reduce 0% > 13/04/11 21:10:59 INFO mapred.JobClient: map 2% reduce 0% > 13/04/11 21:11:02 INFO mapred.JobClient: map 3% reduce 0% > 13/04/11 21:11:05 INFO mapred.JobClient: map 4% reduce 0% > 13/04/11 21:11:08 INFO mapred.JobClient: map 6% reduce 0% > 13/04/11 21:11:11 INFO mapred.JobClient: map 7% reduce 0% > 13/04/11 21:11:17 INFO mapred.JobClient: map 8% reduce 0% > 13/04/11 21:11:23 INFO mapred.JobClient: map 10% reduce 0% > 13/04/11 21:11:26 INFO mapred.JobClient: map 12% reduce 0% > 13/04/11 21:11:32 INFO mapred.JobClient: map 14% reduce 0% > 13/04/11 21:11:44 INFO mapred.JobClient: map 23% reduce 0% > 13/04/11 21:11:50 INFO mapred.JobClient: map 23% reduce 1% > 13/04/11 21:11:53 INFO mapred.JobClient: map 33% reduce 7% > 13/04/11 21:12:02 INFO mapred.JobClient: map 42% reduce 7% >=20 > Please pour some light. > Thanks > Sai --=20 Kai Voigt k@123.org --Apple-Mail=_5594C697-D1E1-4391-A1C9-B278D2B3FFD2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 It's = the reduce JVMs that get started while the map phase is still active. = After the first blocks have been processed by the mappers, that output = gets pulled by the reduce JVMs while the mappers work on the next = blocks.

But the reduce() code won't be called until = all the map output has been sent to the reduce = nodes.

More accurately, the first 33% of the = reduce percentage is this copy = phase.

Kai

Am = 11.04.2013 um 21:15 schrieb Sai Sai <saigraph@yahoo.in>:

I am running the wordcount from hadoop-examples, i am = giving as input a bunch of test files, i have noticed in the output = given below reduce starts when the map is at 23%, i was wondering if it = is not right that reducers will start only after the complete mapping is = done which mean when map is 100% then i thought the reducers will start. = Why r the reducers starting when map is still at 23%.

13/04/11 = 21:10:32 INFO mapred.JobClient:  map 0% reduce 0%
13/04/11 21:10:56 INFO = mapred.JobClient:  map 1% reduce 0%
13/04/11 21:11:02 = INFO mapred.JobClient:  map 3% reduce 0%
13/04/11 21:11:05 INFO = mapred.JobClient:  map 4% reduce 0%
13/04/11 21:11:08 INFO = mapred.JobClient:  map 6% reduce 0%
13/04/11 21:11:11 INFO = mapred.JobClient:  map 7% reduce 0%
13/04/11 21:11:17 INFO = mapred.JobClient:  map 8% reduce 0%
13/04/11 21:11:23 INFO = mapred.JobClient:  map 10% reduce 0%
13/04/11 21:11:26 INFO = mapred.JobClient:  map 12% reduce 0%
13/04/11 21:11:32 INFO mapred.JobClient:  map 14% = reduce 0%
13/04/11 = 21:11:44 INFO mapred.JobClient:  map 23% reduce 0%
13/04/11 21:11:50 INFO = mapred.JobClient:  map 23% reduce 1%
13/04/11 21:11:53 INFO = mapred.JobClient:  map 33% reduce 7%
13/04/11 21:12:02 INFO = mapred.JobClient:  map 42% reduce 7%

Thanks
-- 
Kai Voigt

<= br class=3D"Apple-interchange-newline">


= --Apple-Mail=_5594C697-D1E1-4391-A1C9-B278D2B3FFD2--