Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates
 209.85.216.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAORpBsimOZCWCQC1oDEWv1qcBmmMae3Y_zKaE859xkS-uKXh7g@mail.gmail.com>
References: 
 <CANaAHyZNDFZz1kWyC+n-y9u0GeKFad9Q=QFvHf4UhhnSsJ4JUg@mail.gmail.com>
 <CAORpBsimOZCWCQC1oDEWv1qcBmmMae3Y_zKaE859xkS-uKXh7g@mail.gmail.com>
From: Mohammad Tariq <dontariq@gmail.com>
Date: Thu, 13 Dec 2012 16:20:32 +0530
Message-ID: 
 <CAMVC6RMPJq5eY=2anEPGJfPg7kD5Uf3G8Hi646WgZGhsBWtS_w@mail.gmail.com>
Subject: Re: Incresing map reduce tasks will increse the time of the cpu does
 this seem to be correct
To: user <user@hive.apache.org>
Content-Type: multipart/alternative; boundary=20cf3005dc3a8a407d04d0b9b1d7

--20cf3005dc3a8a407d04d0b9b1d7
Content-Type: text/plain; charset=ISO-8859-1

Hello Imen,

      If you have huge no of tasks then the overhead of managing the map
and reduce task creation begins to dominate the total job execution time.
Also, more tasks means you need more free cpu slots. If the slots are not
free then the data block of interest will be moved to some other node where
frees lots are available and it will consume time and it is also against
the most basic principle of Hadoop i.e data localization. So, the no. of
maps and reduces should be raised keeping all the factors in mind,
otherwise you may face performance issues.

HTH


Regards,
    Mohammad Tariq


On Thu, Dec 13, 2012 at 4:11 PM, Nitin Pawar <nitinpawar432@gmail.com>wrote:

> If the number of maps or reducers your job launched are more than the
> jobqueue/cluster capacity, cpu time will increase
> On Dec 13, 2012 4:02 PM, "imen Megdiche" <imen.megdiche@gmail.com> wrote:
>
>> Hello,
>>
>> I am trying to increase the number of map and reduce tasks for a job and
>> even for the same data size, I noticed that the total time CPU increases but
>> I thought it would decrease. MapReduce is known for performance calculation,
>> but I do not see this when i  do these small tests.
>>
>> What de you thins about this issue ?
>>
>>

--20cf3005dc3a8a407d04d0b9b1d7
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hello Imen,<div><br></div><div>=A0 =A0 =A0 If you have huge no of tasks the=
n=A0the overhead of managing the map and reduce task creation begins to dom=
inate the total job execution time. Also, more tasks means you need more fr=
ee cpu slots. If the slots are not free then the data block of interest wil=
l be moved to some other node where frees lots are available and it will co=
nsume time and it is also against the most basic principle of Hadoop i.e da=
ta localization. So, the no. of maps and reduces should be raised keeping a=
ll the factors in mind, otherwise you may face performance issues.</div>

<div><br></div><div>HTH</div>
<div><br></div><div class=3D"gmail_extra"><br clear=3D"all"><div>Regards,<d=
iv>=A0=A0 =A0Mohammad Tariq</div></div><br>
<br><br><div class=3D"gmail_quote">On Thu, Dec 13, 2012 at 4:11 PM, Nitin P=
awar <span dir=3D"ltr">&lt;<a href=3D"mailto:nitinpawar432@gmail.com" targe=
t=3D"_blank">nitinpawar432@gmail.com</a>&gt;</span> wrote:<br><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex">


<p dir=3D"ltr">If the number of maps or reducers your job launched are more=
 than the jobqueue/cluster capacity, cpu time will increase</p><div><div>
<div class=3D"gmail_quote">On Dec 13, 2012 4:02 PM, &quot;imen Megdiche&quo=
t; &lt;<a href=3D"mailto:imen.megdiche@gmail.com" target=3D"_blank">imen.me=
gdiche@gmail.com</a>&gt; wrote:<br type=3D"attribution"><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">


<span lang=3D"en"><span>Hello,</span><br><br><span>I</span> <span>am trying=
</span> <span>to increase the</span> <span>number of</span> <span>map and r=
educe</span> <span>tasks</span> <span>for a</span> <span>job</span> <span>a=
nd</span> <span>even</span> <span>for the same</span> <span>data size</span=
><span>, I noticed</span> <span>that the total time</span> <span>CPU increa=
ses</span><span></span> <span>but I</span> <span>thought it</span> <span>wo=
uld decrease.</span> <span>MapReduce</span> <span>is known for</span> <span=
>performance</span> <span>calculation, but</span> <span>I do not see</span>=
=A0<span></span>this when i=A0 <span>do these</span> <span>small</span> <sp=
an>tests.<br>


<br>What de you thins about this issue ? <br><br></span></span>
</blockquote></div>
</div></div></blockquote></div><br></div>

--20cf3005dc3a8a407d04d0b9b1d7--