Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 35C4311570 for ; Fri, 1 Aug 2014 10:44:24 +0000 (UTC) Received: (qmail 85544 invoked by uid 500); 1 Aug 2014 10:44:20 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 85433 invoked by uid 500); 1 Aug 2014 10:44:20 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 85418 invoked by uid 99); 1 Aug 2014 10:44:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 10:44:19 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nitinpawar432@gmail.com designates 209.85.213.49 as permitted sender) Received: from [209.85.213.49] (HELO mail-yh0-f49.google.com) (209.85.213.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 10:44:15 +0000 Received: by mail-yh0-f49.google.com with SMTP id b6so2417693yha.22 for ; Fri, 01 Aug 2014 03:43:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Y2/tVjMcgH38ki8S1lF/AuO89S5LyzGDujBxksg++4Q=; b=JgnqjW5ufcCd+1DPYXkVSqCHMZmxg8fbBi7zBoBaRJJyK5VoEzvhrG06aP39mmcWc4 /CewMdxi5o28WZ1Z1Qx0vgjdZRJ4PoBLPq9VvCl52vzsrKGwga650pltNCLMAVNDagqI itIDj9KHvXMLjQaoZGGhtkbNfYSeY+vY0sZ1OyAElQHz3VlrYWGBPPJPv1hftVyHNmzQ cGv5BQIiifj5C323gBlP2VDxOcqyv6ZYE/menhD7+rttkCNcGpXNVZmuZ2KLwYTKfH/1 3SI55QrwCDIsJEiOET+JVcI5x6N3eeCtINPF4qSkG4odi8wGEFTM/Led9WpOt6i9q7od p+8Q== MIME-Version: 1.0 X-Received: by 10.236.88.13 with SMTP id z13mr6962158yhe.34.1406889834594; Fri, 01 Aug 2014 03:43:54 -0700 (PDT) Received: by 10.170.110.146 with HTTP; Fri, 1 Aug 2014 03:43:54 -0700 (PDT) In-Reply-To: References: Date: Fri, 1 Aug 2014 16:13:54 +0530 Message-ID: Subject: Re: Ideal number of mappers and reducers to increase performance From: Nitin Pawar To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf3011e187d448b004ff8f10c6 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3011e187d448b004ff8f10c6 Content-Type: text/plain; charset=UTF-8 the setting mapred.tasktracker.* related settings are related to maximum number of maps or reducers a tasktracker can run. This can change across machines if you have multiple nodes then depending on machine config you can decide these values. If you set it to 4, it will basically mean that at any given point the tasktracker running on that machine will run maximum of 4 maps or reducers. mapred.map.* settings are cluster wide settings. These setting mean that by default how many tasks (maps or reducers) per job should be configured by default. These settings are overwritten by the job when its submitted to jobtracker or by the client itself. Its not must for you to set the mapred.map.tasks or mapred.reduce.tasks as the default value for it is 2 in config. On Fri, Aug 1, 2014 at 4:06 PM, sindhu hosamane wrote: > Thanks a ton for ur help Harsh . I am a newbie in hadoop. > If i have set > mapred.tasktracker.map.tasks.maximum = 4 > mapred.tasktracker.reduce.tasks.maximum = 4 > Should i also bother or set below values > mapred.map.Tasks and mapred.reduce.Tasks . > If yes then what is the ideal value? > > > > > > On Fri, Aug 1, 2014 at 12:00 AM, Harsh J wrote: > >> You can perhaps start with a generic 4+4 configuration (which matches >> your cores), and tune your way upwards or downwards from there based >> on your results. >> >> On Thu, Jul 31, 2014 at 8:35 PM, Sindhu Hosamane >> wrote: >> > Hello friends , >> > >> > If i am running my experiment on a server with 2 processors (4 cores >> each ) . >> > To say it has 2 processors and 8 cores . >> > What would be the ideal values for mapred.tasktracker.map.tasks.maximum >> and mapred.tasktracker.reduce.tasks.maximum to get maximum performance. >> > I am running cascalog queries on data of size 280 MB. >> > I have multiple datanodes running on same machine. >> > >> > Your help is very much appreciated. >> > >> > >> > Regards, >> > sindhu >> > >> >> >> >> -- >> Harsh J >> > > -- Nitin Pawar --20cf3011e187d448b004ff8f10c6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
the setting mapred.tasktracker.* =C2=A0related settings ar= e related to maximum number of maps or reducers a tasktracker can run. This= can change across machines if you have multiple nodes then depending on ma= chine config you can decide these values. If you set it to 4, it will basic= ally mean that at any given point the tasktracker running on that machine w= ill run maximum of 4 maps or reducers.=C2=A0

mapred.map.* settings are cluster wide settings. These setti= ng mean that by default how many tasks (maps or reducers) per job should be= configured by default. These settings are overwritten by the job when its = submitted to jobtracker or by the client itself.=C2=A0

Its not must for you to set the mapred.map.tasks = or mapred.reduce.tasks as the default value for it is 2 in config.=C2=A0




<= div class=3D"gmail_quote"> On Fri, Aug 1, 2014 at 4:06 PM, sindhu hosamane <sindhuht@gmail.com&g= t; wrote:
Thanks a ton =C2=A0for ur help Harsh . I am a newbie = in hadoop.
If i have set
mapred.tasktracker.map.tasks.m= aximum=C2=A0=C2=A0=3D 4
mapred.tasktracker.reduce.tasks.maximum = =3D 4
Should i also bother or set=C2=A0below values
If yes then what is the ideal value?





On Fri, Aug 1, 2014 at 12:00 AM, Harsh J= <harsh@cloudera.com> wrote:
You can perhaps start with a generic 4+4 configuration (which matches
your cores), and tune your way upwards or downwards from there based
on your results.

On Thu, Jul 31, 2014 at 8:35 PM, Sindhu Hosamane <sindhuht@gmail.com> wrote:
> Hello friends ,
>
> If i am running my experiment on a server with 2 processors (4 cores e= ach ) .
> To say it has 2 processors and 8 cores .
> What would be the ideal values for mapred.tasktracker.map.tasks.maximu= m =C2=A0and mapred.tasktracker.reduce.tasks.maximum to get maximum performa= nce.
> I am running cascalog queries on data of size 280 MB.
> I have multiple datanodes running on same machine.
>
> Your help is very much appreciated.
>
>
> Regards,
> sindhu
>



--
Harsh J




--
= Nitin Pawar
--20cf3011e187d448b004ff8f10c6--