Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAO6W-2dj1Mc+uGDg6OhiM8=_tUUMWj=kPuLVxP=wvwRCNqyu1Q@mail.gmail.com>
References: 
 <CALJmrTCzy7p5zXqt2=j9zL8zj5SE2ZO40KTbTx2bmv=g2O3b3w@mail.gmail.com>
 <CAO6W-2dj1Mc+uGDg6OhiM8=_tUUMWj=kPuLVxP=wvwRCNqyu1Q@mail.gmail.com>
From: Julien Muller <julien.muller@ezako.com>
Date: Tue, 26 Feb 2013 12:46:51 +0100
Message-ID: 
 <CACN4pVXyyRLXrzGH6hfUsFfxXzvYuDYsyxAY0AQCqr=_w=cUuA@mail.gmail.com>
Subject: Re: Running terasort with 1 map task
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=089e01419d7cf3c58e04d69f37fa

--089e01419d7cf3c58e04d69f37fa
Content-Type: text/plain; charset=ISO-8859-1

Maybe your goal is to have a baseline for performance measurement?
In that case, you might want to consider running only one taskTracker?  You
would have multiple tasks but running on only 1 machine. Also, you could
make mappers run serially, by configuring only one map slot on your 1 node
cluster.

Nevertheless I agree with Bertrand, this is not really a realistic use case
(or maybe you can give us more clues).

Julien


2013/2/26 Bertrand Dechoux <dechouxb@gmail.com>

> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> It is possible to have a single mapper if the input is not splittable BUT
> it is rarely seen as a feature.
> One could ask why you want to use a platform for distributed computing for
> a job that shouldn't be distributed.
>
> Regards
>
> Bertrand
>
>
>
> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> Hi all,
>>
>> I am trying to run terasort using one map and one reduce. so, I generated
>> the input data using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> Then I launched the hadoop terasort job using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>
>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>> the job statistics I found:
>>
>> hadoop job -history /user/hadoop/output1
>>
>> Task Summary
>> ============================
>> Kind    Total    Successful    Failed    Killed    StartTime    FinishTime
>>
>> Setup    1    1        0    0    26-Feb-2013 10:57:47    26-Feb-2013
>> 10:57:55 (8sec)
>> Map    24    24        0    0    26-Feb-2013 10:57:57    26-Feb-2013
>> 11:05:37 (7mins, 40sec)
>> Reduce    1    1        0    0    26-Feb-2013 10:58:21    26-Feb-2013
>> 11:08:31 (10mins, 10sec)
>> Cleanup    1    1        0    0    26-Feb-2013 11:08:32    26-Feb-2013
>> 11:08:36 (4sec)
>> ============================
>>
>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>
>> How to solve this problem. How to tell hadoop to launch only one map.
>>
>> Thanks,
>>
>
>

--089e01419d7cf3c58e04d69f37fa
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Maybe your goal is to have a baseline for performance=A0measurement?<div>In=
 that case, you might want to consider running only one taskTracker? =A0You=
 would have multiple tasks but running on only 1 machine. Also, you could m=
ake mappers run serially, by configuring only one map slot on your 1 node c=
luster.<br clear=3D"all">

<div><div><br></div><div>Nevertheless I agree with Bertrand, this is not re=
ally a realistic use case (or maybe you can give us more clues).</div><div>=
<br></div><div>Julien</div></div>
<br><br><div class=3D"gmail_quote">2013/2/26 Bertrand Dechoux <span dir=3D"=
ltr">&lt;<a href=3D"mailto:dechouxb@gmail.com" target=3D"_blank">dechouxb@g=
mail.com</a>&gt;</span><br><blockquote class=3D"gmail_quote" style=3D"margi=
n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<a href=3D"http://wiki.apache.org/hadoop/HowManyMapsAndReduces" target=3D"_=
blank">http://wiki.apache.org/hadoop/HowManyMapsAndReduces</a><br><br>It is=
 possible to have a single mapper if the input is not splittable BUT it is =
rarely seen as a feature.<br>


One could ask why you want to use a platform for distributed computing for =
a job that shouldn&#39;t be distributed.<br><br>Regards<span class=3D"HOEnZ=
b"><font color=3D"#888888"><br><br>Bertrand</font></span><div class=3D"HOEn=
Zb">

<div class=3D"h5"><br><br><br><div class=3D"gmail_quote">On Tue, Feb 26, 20=
13 at 12:09 PM, Arindam Choudhury <span dir=3D"ltr">&lt;<a href=3D"mailto:a=
rindamchoudhury0@gmail.com" target=3D"_blank">arindamchoudhury0@gmail.com</=
a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Hi all,<br><br>I am tr=
ying to run terasort using one map and one reduce. so, I generated the inpu=
t data using:<br>


<br>hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=3D1 -Dm=
apred.reduce.tasks=3D1 32000000 /user/hadoop/input32mb1map<br>
<br>Then I launched the hadoop terasort job using:<br><br>hadoop jar hadoop=
-examples-1.0.4.jar terasort -Dmapred.map.tasks=3D1 -Dmapred.reduce.tasks=
=3D1 /user/hadoop/input32mb1map /user/hadoop/output1<br><br>I thought it wi=
ll run the job using 1 map and 1 reduce, but when inspect the job statistic=
s I found:<br>


<br>hadoop job -history /user/hadoop/output1<br><br>Task Summary<br>=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D<br>Kind=A0=A0=A0 Total=A0=A0=A0 Successful=A0=A0=A0 Failed=A0=A0=A0 Kil=
led=A0=A0=A0 StartTime=A0=A0=A0 FinishTime<br>=A0=A0=A0 <br>Setup=A0=A0=A0 =
1=A0=A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 10:57:47=
=A0=A0=A0 26-Feb-2013 10:57:55 (8sec)<br>


Map=A0=A0=A0 24=A0=A0=A0 24=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb=
-2013 10:57:57=A0=A0=A0 26-Feb-2013 11:05:37 (7mins, 40sec)<br>Reduce=A0=A0=
=A0 1=A0=A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 10:58=
:21=A0=A0=A0 26-Feb-2013 11:08:31 (10mins, 10sec)<br>Cleanup=A0=A0=A0 1=A0=
=A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 11:08:32=A0=
=A0=A0 26-Feb-2013 11:08:36 (4sec)<br>


=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D<br><br>so, though I mentioned to launch one map tasks, there are =
24 of them.<br><br>How to solve this problem. How to tell hadoop to launch =
only one map. <br><br></div>Thanks,<br></div>
</blockquote></div><br>
</div></div></blockquote></div><br></div>

--089e01419d7cf3c58e04d69f37fa--