Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of dechouxb@gmail.com designates
 209.85.215.51 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALJmrTCzy7p5zXqt2=j9zL8zj5SE2ZO40KTbTx2bmv=g2O3b3w@mail.gmail.com>
References: 
 <CALJmrTCzy7p5zXqt2=j9zL8zj5SE2ZO40KTbTx2bmv=g2O3b3w@mail.gmail.com>
Date: Tue, 26 Feb 2013 12:25:39 +0100
Message-ID: 
 <CAO6W-2dj1Mc+uGDg6OhiM8=_tUUMWj=kPuLVxP=wvwRCNqyu1Q@mail.gmail.com>
Subject: Re: Running terasort with 1 map task
From: Bertrand Dechoux <dechouxb@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=14dae9d7121ed2a20c04d69eea4b

--14dae9d7121ed2a20c04d69eea4b
Content-Type: text/plain; charset=ISO-8859-1

http://wiki.apache.org/hadoop/HowManyMapsAndReduces

It is possible to have a single mapper if the input is not splittable BUT
it is rarely seen as a feature.
One could ask why you want to use a platform for distributed computing for
a job that shouldn't be distributed.

Regards

Bertrand


On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:

> Hi all,
>
> I am trying to run terasort using one map and one reduce. so, I generated
> the input data using:
>
> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>
> Then I launched the hadoop terasort job using:
>
> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>
> I thought it will run the job using 1 map and 1 reduce, but when inspect
> the job statistics I found:
>
> hadoop job -history /user/hadoop/output1
>
> Task Summary
> ============================
> Kind    Total    Successful    Failed    Killed    StartTime    FinishTime
>
> Setup    1    1        0    0    26-Feb-2013 10:57:47    26-Feb-2013
> 10:57:55 (8sec)
> Map    24    24        0    0    26-Feb-2013 10:57:57    26-Feb-2013
> 11:05:37 (7mins, 40sec)
> Reduce    1    1        0    0    26-Feb-2013 10:58:21    26-Feb-2013
> 11:08:31 (10mins, 10sec)
> Cleanup    1    1        0    0    26-Feb-2013 11:08:32    26-Feb-2013
> 11:08:36 (4sec)
> ============================
>
> so, though I mentioned to launch one map tasks, there are 24 of them.
>
> How to solve this problem. How to tell hadoop to launch only one map.
>
> Thanks,
>

--14dae9d7121ed2a20c04d69eea4b
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<a href=3D"http://wiki.apache.org/hadoop/HowManyMapsAndReduces">http://wiki=
.apache.org/hadoop/HowManyMapsAndReduces</a><br><br>It is possible to have =
a single mapper if the input is not splittable BUT it is rarely seen as a f=
eature.<br>
One could ask why you want to use a platform for distributed computing for =
a job that shouldn&#39;t be distributed.<br><br>Regards<br><br>Bertrand<br>=
<br><br><div class=3D"gmail_quote">On Tue, Feb 26, 2013 at 12:09 PM, Arinda=
m Choudhury <span dir=3D"ltr">&lt;<a href=3D"mailto:arindamchoudhury0@gmail=
.com" target=3D"_blank">arindamchoudhury0@gmail.com</a>&gt;</span> wrote:<b=
r>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Hi all,<br><br>I am tr=
ying to run terasort using one map and one reduce. so, I generated the inpu=
t data using:<br>
<br>hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=3D1 -Dm=
apred.reduce.tasks=3D1 32000000 /user/hadoop/input32mb1map<br>
<br>Then I launched the hadoop terasort job using:<br><br>hadoop jar hadoop=
-examples-1.0.4.jar terasort -Dmapred.map.tasks=3D1 -Dmapred.reduce.tasks=
=3D1 /user/hadoop/input32mb1map /user/hadoop/output1<br><br>I thought it wi=
ll run the job using 1 map and 1 reduce, but when inspect the job statistic=
s I found:<br>

<br>hadoop job -history /user/hadoop/output1<br><br>Task Summary<br>=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D<br>Kind=A0=A0=A0 Total=A0=A0=A0 Successful=A0=A0=A0 Failed=A0=A0=A0 Kil=
led=A0=A0=A0 StartTime=A0=A0=A0 FinishTime<br>=A0=A0=A0 <br>Setup=A0=A0=A0 =
1=A0=A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 10:57:47=
=A0=A0=A0 26-Feb-2013 10:57:55 (8sec)<br>

Map=A0=A0=A0 24=A0=A0=A0 24=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb=
-2013 10:57:57=A0=A0=A0 26-Feb-2013 11:05:37 (7mins, 40sec)<br>Reduce=A0=A0=
=A0 1=A0=A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 10:58=
:21=A0=A0=A0 26-Feb-2013 11:08:31 (10mins, 10sec)<br>Cleanup=A0=A0=A0 1=A0=
=A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 11:08:32=A0=
=A0=A0 26-Feb-2013 11:08:36 (4sec)<br>

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D<br><br>so, though I mentioned to launch one map tasks, there are =
24 of them.<br><br>How to solve this problem. How to tell hadoop to launch =
only one map. <br><br></div>Thanks,<br></div>
</blockquote></div><br>

--14dae9d7121ed2a20c04d69eea4b--