Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1C42DEDDB for ; Tue, 26 Feb 2013 11:26:14 +0000 (UTC) Received: (qmail 37139 invoked by uid 500); 26 Feb 2013 11:26:09 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 37042 invoked by uid 500); 26 Feb 2013 11:26:08 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 37015 invoked by uid 99); 26 Feb 2013 11:26:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 11:26:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dechouxb@gmail.com designates 209.85.215.51 as permitted sender) Received: from [209.85.215.51] (HELO mail-la0-f51.google.com) (209.85.215.51) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 11:26:01 +0000 Received: by mail-la0-f51.google.com with SMTP id fo13so3775052lab.10 for ; Tue, 26 Feb 2013 03:25:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=9FlgsNeN3/NOxcbylpCPJhIoo2YLsoqmGGb0orL1nMY=; b=D/CoOctk4fyO5zFq7rtK+wRmdwYg5GbaQCKj2ZL0jQkr63Q5SHvz7DpUe80Xrvdq1C i3/GKOHSlz3YBs5318fBorOhiZGhTE/HjIzLiTX911bAw7ufVgbPH1qGcokcvW/+awys 7jg5/MpxBDe11ErrNkN45UzxODr7y5pX04e5OEA0KDM+iNHqg32wC+FTyYJeYPWNf/Kw K1el9ypGnrbdJIrncHXL8MlRNQXSebVLRcB99C1hRkq6fHMzy2FzSUXs2i1NxEi73YiG c899VBee1XxyfDsedXk8pC2a9tmSdCzlMnaEedLy9g1UDLR4ETSPiAqILmxlRITbBL9b k5BQ== MIME-Version: 1.0 X-Received: by 10.112.100.166 with SMTP id ez6mr408838lbb.86.1361877939689; Tue, 26 Feb 2013 03:25:39 -0800 (PST) Received: by 10.112.31.9 with HTTP; Tue, 26 Feb 2013 03:25:39 -0800 (PST) In-Reply-To: References: Date: Tue, 26 Feb 2013 12:25:39 +0100 Message-ID: Subject: Re: Running terasort with 1 map task From: Bertrand Dechoux To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae9d7121ed2a20c04d69eea4b X-Virus-Checked: Checked by ClamAV on apache.org --14dae9d7121ed2a20c04d69eea4b Content-Type: text/plain; charset=ISO-8859-1 http://wiki.apache.org/hadoop/HowManyMapsAndReduces It is possible to have a single mapper if the input is not splittable BUT it is rarely seen as a feature. One could ask why you want to use a platform for distributed computing for a job that shouldn't be distributed. Regards Bertrand On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury < arindamchoudhury0@gmail.com> wrote: > Hi all, > > I am trying to run terasort using one map and one reduce. so, I generated > the input data using: > > hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1 > -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map > > Then I launched the hadoop terasort job using: > > hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1 > -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1 > > I thought it will run the job using 1 map and 1 reduce, but when inspect > the job statistics I found: > > hadoop job -history /user/hadoop/output1 > > Task Summary > ============================ > Kind Total Successful Failed Killed StartTime FinishTime > > Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013 > 10:57:55 (8sec) > Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013 > 11:05:37 (7mins, 40sec) > Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013 > 11:08:31 (10mins, 10sec) > Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013 > 11:08:36 (4sec) > ============================ > > so, though I mentioned to launch one map tasks, there are 24 of them. > > How to solve this problem. How to tell hadoop to launch only one map. > > Thanks, > --14dae9d7121ed2a20c04d69eea4b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://wiki= .apache.org/hadoop/HowManyMapsAndReduces

It is possible to have = a single mapper if the input is not splittable BUT it is rarely seen as a f= eature.
One could ask why you want to use a platform for distributed computing for = a job that shouldn't be distributed.

Regards

Bertrand
=

On Tue, Feb 26, 2013 at 12:09 PM, Arinda= m Choudhury <arindamchoudhury0@gmail.com> wrote:
Hi all,

I am tr= ying to run terasort using one map and one reduce. so, I generated the inpu= t data using:

hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=3D1 -Dm= apred.reduce.tasks=3D1 32000000 /user/hadoop/input32mb1map

Then I launched the hadoop terasort job using:

hadoop jar hadoop= -examples-1.0.4.jar terasort -Dmapred.map.tasks=3D1 -Dmapred.reduce.tasks= =3D1 /user/hadoop/input32mb1map /user/hadoop/output1

I thought it wi= ll run the job using 1 map and 1 reduce, but when inspect the job statistic= s I found:

hadoop job -history /user/hadoop/output1

Task Summary
=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D
Kind=A0=A0=A0 Total=A0=A0=A0 Successful=A0=A0=A0 Failed=A0=A0=A0 Kil= led=A0=A0=A0 StartTime=A0=A0=A0 FinishTime
=A0=A0=A0
Setup=A0=A0=A0 = 1=A0=A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 10:57:47= =A0=A0=A0 26-Feb-2013 10:57:55 (8sec)
Map=A0=A0=A0 24=A0=A0=A0 24=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb= -2013 10:57:57=A0=A0=A0 26-Feb-2013 11:05:37 (7mins, 40sec)
Reduce=A0=A0= =A0 1=A0=A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 10:58= :21=A0=A0=A0 26-Feb-2013 11:08:31 (10mins, 10sec)
Cleanup=A0=A0=A0 1=A0= =A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 11:08:32=A0= =A0=A0 26-Feb-2013 11:08:36 (4sec)
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D

so, though I mentioned to launch one map tasks, there are = 24 of them.

How to solve this problem. How to tell hadoop to launch = only one map.

Thanks,

--14dae9d7121ed2a20c04d69eea4b--