Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of anilgupta84@gmail.com
 designates 209.85.160.48 as permitted sender)
Subject: Re: Number of Maps running more than expected
References: 
 <CACq8Ys1suCR5OALF1+g-w+cE-hqL60ckWaEgRSH_bSn+yWf2kg@mail.gmail.com>
From: Anil Gupta <anilgupta84@gmail.com>
Content-Type: multipart/alternative;
	boundary=Apple-Mail-D3839967-33DB-40B6-BCA7-24B7071122EA
In-Reply-To: 
 <CACq8Ys1suCR5OALF1+g-w+cE-hqL60ckWaEgRSH_bSn+yWf2kg@mail.gmail.com>
Message-Id: <85CC6A30-BF6F-4CEB-9E92-6AA2D73C596B@gmail.com>
Date: Thu, 16 Aug 2012 07:27:13 -0700
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (1.0)


--Apple-Mail-D3839967-33DB-40B6-BCA7-24B7071122EA
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Hi Gaurav,

Did you turn off speculative execution?

Best Regards,
Anil

On Aug 16, 2012, at 7:13 AM, Gaurav Dasgupta <gdsayshi@gmail.com> wrote:

> Hi users,
> =20
> I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all t=
he 12 nodes and 1 node running the Job Tracker).
> In order to perform a WordCount benchmark test, I did the following:
> Executed "RandomTextWriter" first to create 100 GB data (Note that I have c=
hanged the "test.randomtextwrite.total_bytes" parameter only, rest all are k=
ept default).
> Next, executed the "WordCount" program for that 100 GB dataset.
> The "Block Size" in "hdfs-site.xml" is set as 128 MB. Now, according to my=
 calculation, total number of Maps to be executed by the wordcount job shoul=
d be 100 GB / 128 MB or 102400 MB / 128 MB =3D 800.
> But when I am executing the job, it is running a total number of 900 Maps,=
 i.e., 100 extra.
> So, why this extra number of Maps? Although, my job is completing successf=
ully without any error.
> =20
> Again, if I don't execute the "RandomTextWwriter" job to create data for m=
y wordcount, rather I put my own 100 GB text file in HDFS and run "WordCount=
", I can then see the number of Maps are equivalent to my calculation, i.e.,=
 800.
> =20
> Can anyone tell me why this odd behaviour of Hadoop regarding the number o=
f Maps for WordCount only when the dataset is generated by RandomTextWriter?=
 And what is the purpose of these extra number of Maps?
> =20
> Regards,
> Gaurav Dasgupta

--Apple-Mail-D3839967-33DB-40B6-BCA7-24B7071122EA
Content-Transfer-Encoding: 7bit
Content-Type: text/html;
	charset=utf-8

<html><head></head><body bgcolor="#FFFFFF"><div>Hi Gaurav,</div><div><br></div><div>Did you turn off speculative execution?<br><br><div>Best R<span class="Apple-style-span" style="-webkit-tap-highlight-color: rgba(26, 26, 26, 0.296875); -webkit-composition-fill-color: rgba(175, 192, 227, 0.230469); -webkit-composition-frame-color: rgba(77, 128, 180, 0.230469); ">egards,</span></div><div>Anil</div></div><div><br>On Aug 16, 2012, at 7:13 AM, Gaurav Dasgupta &lt;<a href="mailto:gdsayshi@gmail.com">gdsayshi@gmail.com</a>&gt; wrote:<br><br></div><div></div><blockquote type="cite"><div><div>Hi users,</div>
<div>&nbsp;</div>
<div>I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all the 12 nodes and 1 node running the Job Tracker).</div>
<div>In order to perform&nbsp;a WordCount benchmark test, I did the following:</div>
<ul>
<li>Executed "RandomTextWriter" first to create&nbsp;100 GB data (Note that I have changed the "test.randomtextwrite.total_bytes" parameter only, rest all are kept default).</li>
<li>Next, executed the "WordCount"&nbsp;program for that 100 GB dataset.</li></ul>
<div>The "Block Size" in "hdfs-site.xml" is set as 128 MB. Now, according to my calculation, total number of Maps to be executed by the wordcount job should be 100 GB&nbsp;/ 128 MB or 102400 MB / 128 MB = 800.</div>

<div>But when I am executing the job, it is running a total number of 900 Maps, i.e., 100 extra. </div>
<div>So, why this extra number of Maps? Although, my job is completing successfully without any error.</div>
<div>&nbsp;</div>
<div>Again, if I don't execute the "RandomTextWwriter" job to create data for my wordcount, rather I put my own 100 GB text file in HDFS and run "WordCount", I can then see the number of Maps are equivalent to my calculation, i.e., 800.</div>

<div>&nbsp;</div>
<div>Can anyone tell me why this odd behaviour of Hadoop regarding the number of Maps for WordCount&nbsp;only when the dataset is generated by RandomTextWriter? And what&nbsp;is the purpose of these&nbsp;extra number of Maps?</div>
<div>&nbsp;</div>
<div>Regards,</div>
<div>Gaurav Dasgupta</div>
</div></blockquote></body></html>
--Apple-Mail-D3839967-33DB-40B6-BCA7-24B7071122EA--