Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EF371C5E9 for ; Tue, 1 Oct 2013 07:12:33 +0000 (UTC) Received: (qmail 94136 invoked by uid 500); 1 Oct 2013 07:08:26 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 93886 invoked by uid 500); 1 Oct 2013 07:07:47 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 93846 invoked by uid 99); 1 Oct 2013 07:07:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Oct 2013 07:07:30 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of himanshuvj@gmail.com designates 74.125.82.179 as permitted sender) Received: from [74.125.82.179] (HELO mail-we0-f179.google.com) (74.125.82.179) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Oct 2013 07:07:26 +0000 Received: by mail-we0-f179.google.com with SMTP id x55so6886368wes.10 for ; Tue, 01 Oct 2013 00:07:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=0U3Y0jyIfNavdcw9cCBjbu/6C1nsyDlHe0h0zVIVZuw=; b=UECjJjCg7K1AFmsAgqyRChlDuAUUx+JJ4Zctr7UFDSWpifdcAFYdGzRvXeH9ip/d8S PZ1N1OIjBMYdhnTf4/5MLmCDD5ra2MC23yvrBqlTDCfPJJ2FXpa4VFlU6NQ6ww20eROL H4DFW121SXy4a0Jysyf6Xk/u3w049IW3rHApTCX+KXJ7XFQvtohOxJoANCMSTG5cp+uZ rF8P/ZkewNemG5K2D/hf+cpY7/MyCA3SFAVJwYrzIUpwiSWf/qskTbsoyHxIJ4eg3n/4 ggAzbKJpXZACiVT4UZNedHjg9uNpZ1co9zqAw6+sb41gyEVzYeoC0TuOR9a9MOo8o/xO vjHQ== X-Received: by 10.180.38.9 with SMTP id c9mr17306154wik.44.1380611225043; Tue, 01 Oct 2013 00:07:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.2.202 with HTTP; Tue, 1 Oct 2013 00:06:44 -0700 (PDT) In-Reply-To: References: From: Himanshu Vijay Date: Tue, 1 Oct 2013 00:06:44 -0700 Message-ID: Subject: Re: Cluster config: Mapper:Reducer Task Capapcity To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8f643342a44c5604e7a89949 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f643342a44c5604e7a89949 Content-Type: text/plain; charset=ISO-8859-1 What is the down side of increasing both mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum to same value ? I read on this linkthat: mapred.tasktracker.map.tasks.maximum 1/2 * (cores/node) to 2 * (cores/node)Number of map tasks to deploy on each machine. mapred.tasktracker.reduce.tasks.maximum1/2 * (cores/node) to 2 * (cores/node) Number of reduce tasks to deploy on each machine. Each node has 8 cores. So according to above guidance I should both the configs from 4 to 16. The ratio of mapper to reducer doesn't really matter as far as these two properties are concerned. On Mon, Sep 30, 2013 at 12:52 PM, Sandy Ryza wrote: > Hi Himanshu, > > Changing the ratio is definitely a reasonable thing to do. The capacities > come from the mapred.tasktracker.map.tasks.maximum > and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations. > You can tweak these on your nodes to get your desired ratio. > > -Sandy > > > On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay wrote: > >> Hi, >> >> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map >> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a >> ratio of 2.7. We have a lot of variety of jobs running and we want to >> increase the throughput. >> >> My manual observation was that we hit the Mapper capacity and hence many >> jobs have to wait even though lot of room left in Reduce capacity. I mined >> the jobtracker logs for the jobs that completed and saw that on a hourly >> basis as well as daily basis the mapper:reducer ratio was 4-5. >> >> To increase the throughput I was thinking that I experiment changing the >> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to >> ~4. >> >> Does this sound like a correct approach ? Is this something that I can >> control or it's determined automatically by Hadoop ? >> >> Have any of you done this kind of exercise ? If yes can you please direct >> how to go about changing this ratio. I am not finding much literature on >> it. >> >> Note: Mapper and ReducerTask Capacity is the max total no. of >> mappers/reducers you can run on the cluster at any point. >> >> Regards, >> -Himanshu Vijay >> > > -- -Himanshu Vijay --e89a8f643342a44c5604e7a89949 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
What is the down side of increasing both mapred.tasktracke= r.map.tasks.maximum and=A0mapred.tasktracker.reduce.tasks.maximum to same v= alue ?=A0

I read on this link that:
mapred.t= asktracker.map.tasks.maximum 1/2 * (cores/node) to 2 * (cores/node)Number of map tasks = to deploy on each machine.
mapred.tasktracker.reduce.tasks.maximum1/2 * (= cores/node) to 2 * (cores/node) Number of reduce tasks to deploy on each machine.
Each node has 8 cores. So according = to above guidance I should both the configs from 4 to 16. The ratio of mapp= er to reducer doesn't really matter as far as these two properties are = concerned.


On Mon, Sep 3= 0, 2013 at 12:52 PM, Sandy Ryza <sandy.ryza@cloudera.com> wrote:
Hi Himanshu,

Changing = the ratio is definitely a reasonable thing to do. =A0The capacities come fr= om the=A0mapred.tasktracker.map.tasks.maximum and=A0mapred.tasktracker.redu= ce.tasks.maximum tasktracker configurations. =A0You can tweak these on your= nodes to get your desired ratio. =A0=A0

-Sandy



--
= -Himanshu Vijay
--e89a8f643342a44c5604e7a89949--