Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 88B74101E1 for ; Mon, 30 Sep 2013 19:52:57 +0000 (UTC) Received: (qmail 46496 invoked by uid 500); 30 Sep 2013 19:52:52 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 46389 invoked by uid 500); 30 Sep 2013 19:52:52 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 46381 invoked by uid 99); 30 Sep 2013 19:52:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Sep 2013 19:52:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sandy.ryza@cloudera.com designates 209.85.220.52 as permitted sender) Received: from [209.85.220.52] (HELO mail-pa0-f52.google.com) (209.85.220.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Sep 2013 19:52:45 +0000 Received: by mail-pa0-f52.google.com with SMTP id kl14so6284133pab.39 for ; Mon, 30 Sep 2013 12:52:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=baFgBwRFYnb70WJxDq5+21LTFZ9T6VOjFR0i71gS164=; b=RQ1pnqwaqE9lfysd97SOatLylQC9be7ep/H0O5HTGTQAz5OO4N8o+w9USbaisjIJrr 5TeGyQXW5vOp74jYkd0RHVRT80SZAolW9C4bjw5OpffSTp+gL3iUl9tcFjYZwR2gxvO6 X1siTl42nYXuTJDSgvYwfgQIeGauCK0bK4GfgF+K5lS+fCedB9TNk3ZSY68w/AlIgfof eZwBMcVU3NMnvnSm8tTzovlbWokt809SWX+aPjPq2yELilGusw72L7Z/Q6RpbNuTjafa Zgp7jMURMLsXZrIkRMH0lx8hf7HiZl98NbSltYiLR6lPw8RfajhoZTB3pyqwjbyTO5DB ONug== X-Gm-Message-State: ALoCoQmtfFtZf68GSlpOPnueF3Cr0dLSMEHwnyESDzt5824pzGgGe1H/kjZC2BD9iGN0H31Zy4tM MIME-Version: 1.0 X-Received: by 10.68.106.99 with SMTP id gt3mr25591981pbb.116.1380570743725; Mon, 30 Sep 2013 12:52:23 -0700 (PDT) Received: by 10.70.52.2 with HTTP; Mon, 30 Sep 2013 12:52:23 -0700 (PDT) In-Reply-To: References: Date: Mon, 30 Sep 2013 12:52:23 -0700 Message-ID: Subject: Re: Cluster config: Mapper:Reducer Task Capapcity From: Sandy Ryza To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b6d8818c4731004e79f2c78 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b6d8818c4731004e79f2c78 Content-Type: text/plain; charset=ISO-8859-1 Hi Himanshu, Changing the ratio is definitely a reasonable thing to do. The capacities come from the mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations. You can tweak these on your nodes to get your desired ratio. -Sandy On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay wrote: > Hi, > > Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map > Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a > ratio of 2.7. We have a lot of variety of jobs running and we want to > increase the throughput. > > My manual observation was that we hit the Mapper capacity and hence many > jobs have to wait even though lot of room left in Reduce capacity. I mined > the jobtracker logs for the jobs that completed and saw that on a hourly > basis as well as daily basis the mapper:reducer ratio was 4-5. > > To increase the throughput I was thinking that I experiment changing the > Map and Reducer Task Capacity such that the ratio is increased from 2.7 to > ~4. > > Does this sound like a correct approach ? Is this something that I can > control or it's determined automatically by Hadoop ? > > Have any of you done this kind of exercise ? If yes can you please direct > how to go about changing this ratio. I am not finding much literature on > it. > > Note: Mapper and ReducerTask Capacity is the max total no. of > mappers/reducers you can run on the cluster at any point. > > Regards, > -Himanshu Vijay > --047d7b6d8818c4731004e79f2c78 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Himanshu,

Changing the ratio i= s definitely a reasonable thing to do. =A0The capacities come from the=A0ma= pred.tasktracker.map.tasks.maximum and=A0mapred.tasktracker.reduce.tasks.ma= ximum tasktracker configurations. =A0You can tweak these on your nodes to g= et your desired ratio. =A0=A0

-Sandy


On Mon, Sep 30, 2013 at 12:39 PM, Hima= nshu Vijay <himanshuvj@gmail.com> wrote:
Hi,

Our = Hadoop cluster is running 0.20.203. The cluster currently has 'Map Task= Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting i= n a ratio of 2.7. We have a lot of variety of jobs running and we want to i= ncrease the throughput.=A0

My manual observation was that we hit the Mapper capaci= ty and hence many jobs have to wait even though lot of room left in Reduce = capacity. I mined the jobtracker logs for the jobs that completed and saw t= hat on a hourly basis as well as daily basis the mapper:reducer ratio was 4= -5.=A0

To increase the throughput I was thinking that I experi= ment changing the Map and Reducer Task Capacity such that the ratio is incr= eased from 2.7 to ~4.=A0

Does this sound like a co= rrect approach ? Is this something that I can control or it's determine= d automatically by Hadoop ?

Have any of you done this kind of exercise ? If y= es can you please direct how to go about changing this ratio. I am not find= ing much literature on it.=A0

Note: Mapper and Red= ucerTask Capacity is the max total no. of mappers/reducers you can run on t= he cluster at any point.

Regards,
-Himanshu Vijay

--047d7b6d8818c4731004e79f2c78--