Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0798CEF55 for ; Tue, 26 Feb 2013 11:47:49 +0000 (UTC) Received: (qmail 86909 invoked by uid 500); 26 Feb 2013 11:47:44 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 86727 invoked by uid 500); 26 Feb 2013 11:47:43 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 86711 invoked by uid 99); 26 Feb 2013 11:47:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 11:47:43 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.172] (HELO mail-we0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 11:47:35 +0000 Received: by mail-we0-f172.google.com with SMTP id x10so3483520wey.3 for ; Tue, 26 Feb 2013 03:47:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-originating-ip:in-reply-to:references :from:date:message-id:subject:to:content-type:x-gm-message-state; bh=KHnElvWjXBmE0pjBipPpjtLUDd0UShs1zq5dQiK66aU=; b=n3njPsk1DkM40Qm3Y2Q/1cuRF4uft1gCDe+mg7GQLQiwiCXbnkO2BsNugkJ3Nm819j HgWGy7Dpe5i68/1D1yS024+h2B2tQlpew8/WIBZHoKsxg4zdNBZt5gZpRHJaNZfUcecP 8uRrtoRjZrzSRa6vxWwk1seHhGfOfWNfUTYBHEA0RVFY2vxuNh95uslr2p1cyKvEFX5a 46kLbjzpGJHKJlT6fMazD6D42agOwV89A1s4/8I0V+/WpK3elKmwjpCamKxkUjrr+5Rd Kdmg77uujWTVY2ixn8W/OMqvJ41bG812wAtwI1hUZYJR99e+z83fQQrPXIHAaOtjJ4nY r9jg== X-Received: by 10.194.235.196 with SMTP id uo4mr25730114wjc.30.1361879233702; Tue, 26 Feb 2013 03:47:13 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.37.105 with HTTP; Tue, 26 Feb 2013 03:46:51 -0800 (PST) X-Originating-IP: [31.36.136.112] In-Reply-To: References: From: Julien Muller Date: Tue, 26 Feb 2013 12:46:51 +0100 Message-ID: Subject: Re: Running terasort with 1 map task To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e01419d7cf3c58e04d69f37fa X-Gm-Message-State: ALoCoQlnv3AlWTeYW1Sex+j65NqZH7cAJx2slChTPSkOWY9Mlc7HU9v3bJzOYjb+9UuErsOQGS/b X-Virus-Checked: Checked by ClamAV on apache.org --089e01419d7cf3c58e04d69f37fa Content-Type: text/plain; charset=ISO-8859-1 Maybe your goal is to have a baseline for performance measurement? In that case, you might want to consider running only one taskTracker? You would have multiple tasks but running on only 1 machine. Also, you could make mappers run serially, by configuring only one map slot on your 1 node cluster. Nevertheless I agree with Bertrand, this is not really a realistic use case (or maybe you can give us more clues). Julien 2013/2/26 Bertrand Dechoux > http://wiki.apache.org/hadoop/HowManyMapsAndReduces > > It is possible to have a single mapper if the input is not splittable BUT > it is rarely seen as a feature. > One could ask why you want to use a platform for distributed computing for > a job that shouldn't be distributed. > > Regards > > Bertrand > > > > On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury < > arindamchoudhury0@gmail.com> wrote: > >> Hi all, >> >> I am trying to run terasort using one map and one reduce. so, I generated >> the input data using: >> >> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1 >> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map >> >> Then I launched the hadoop terasort job using: >> >> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1 >> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1 >> >> I thought it will run the job using 1 map and 1 reduce, but when inspect >> the job statistics I found: >> >> hadoop job -history /user/hadoop/output1 >> >> Task Summary >> ============================ >> Kind Total Successful Failed Killed StartTime FinishTime >> >> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013 >> 10:57:55 (8sec) >> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013 >> 11:05:37 (7mins, 40sec) >> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013 >> 11:08:31 (10mins, 10sec) >> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013 >> 11:08:36 (4sec) >> ============================ >> >> so, though I mentioned to launch one map tasks, there are 24 of them. >> >> How to solve this problem. How to tell hadoop to launch only one map. >> >> Thanks, >> > > --089e01419d7cf3c58e04d69f37fa Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Maybe your goal is to have a baseline for performance=A0measurement?
In= that case, you might want to consider running only one taskTracker? =A0You= would have multiple tasks but running on only 1 machine. Also, you could m= ake mappers run serially, by configuring only one map slot on your 1 node c= luster.

Nevertheless I agree with Bertrand, this is not re= ally a realistic use case (or maybe you can give us more clues).
=
Julien


2013/2/26 Bertrand Dechoux <dechouxb@g= mail.com>
http://wiki.apache.org/hadoop/HowManyMapsAndReduces

It is= possible to have a single mapper if the input is not splittable BUT it is = rarely seen as a feature.
One could ask why you want to use a platform for distributed computing for = a job that shouldn't be distributed.

Regards

Bertrand



On Tue, Feb 26, 20= 13 at 12:09 PM, Arindam Choudhury <arindamchoudhury0@gmail.com> wrote:
Hi all,

I am tr= ying to run terasort using one map and one reduce. so, I generated the inpu= t data using:

hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=3D1 -Dm= apred.reduce.tasks=3D1 32000000 /user/hadoop/input32mb1map

Then I launched the hadoop terasort job using:

hadoop jar hadoop= -examples-1.0.4.jar terasort -Dmapred.map.tasks=3D1 -Dmapred.reduce.tasks= =3D1 /user/hadoop/input32mb1map /user/hadoop/output1

I thought it wi= ll run the job using 1 map and 1 reduce, but when inspect the job statistic= s I found:

hadoop job -history /user/hadoop/output1

Task Summary
=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D
Kind=A0=A0=A0 Total=A0=A0=A0 Successful=A0=A0=A0 Failed=A0=A0=A0 Kil= led=A0=A0=A0 StartTime=A0=A0=A0 FinishTime
=A0=A0=A0
Setup=A0=A0=A0 = 1=A0=A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 10:57:47= =A0=A0=A0 26-Feb-2013 10:57:55 (8sec)
Map=A0=A0=A0 24=A0=A0=A0 24=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb= -2013 10:57:57=A0=A0=A0 26-Feb-2013 11:05:37 (7mins, 40sec)
Reduce=A0=A0= =A0 1=A0=A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 10:58= :21=A0=A0=A0 26-Feb-2013 11:08:31 (10mins, 10sec)
Cleanup=A0=A0=A0 1=A0= =A0=A0 1=A0=A0=A0 =A0=A0=A0 0=A0=A0=A0 0=A0=A0=A0 26-Feb-2013 11:08:32=A0= =A0=A0 26-Feb-2013 11:08:36 (4sec)
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D

so, though I mentioned to launch one map tasks, there are = 24 of them.

How to solve this problem. How to tell hadoop to launch = only one map.

Thanks,


--089e01419d7cf3c58e04d69f37fa--