Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 60277EEAF for ; Wed, 16 Jan 2013 17:02:55 +0000 (UTC) Received: (qmail 70788 invoked by uid 500); 16 Jan 2013 17:02:50 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 70611 invoked by uid 500); 16 Jan 2013 17:02:50 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 70602 invoked by uid 99); 16 Jan 2013 17:02:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jan 2013 17:02:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jwfbean@cloudera.com designates 209.85.223.175 as permitted sender) Received: from [209.85.223.175] (HELO mail-ie0-f175.google.com) (209.85.223.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jan 2013 17:02:41 +0000 Received: by mail-ie0-f175.google.com with SMTP id qd14so2949222ieb.34 for ; Wed, 16 Jan 2013 09:02:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=5+o3IE7URx6wwPMrwYiXLgsOJbZvKxsUJKlC1UB65Bg=; b=FttenjNg/jno29gUzVpGkGA2x3kSs6hYcxraCT69X9VvVqzl+MbzbEZL+iwpGfhOMD 1OKQWX0ajbgP3H/mx8MTl/qnrrHGpRG5ezNqGexaISabbiXxn5i/sdU0Lj9+EIKQqkR3 BhRaxI8SUGXWte7TfjJLYWp5LDaDm4KBy5+ihmIbU1fQJ1HeOl1FBbygZJR1APdSbSqT BekQd7VHgQ+KqTzdtUz1cBSPgeeIqo91CI8lhkeGmw+mZ31qnOCLwNt6x1OJjcJb9IIr v9AlX8lH5M1tluRYVL1m1SYalzA/yQcB4/fES2ac4yKJXdwkRZCwa3Yam/AwY1ZLRg/8 Hrvg== MIME-Version: 1.0 X-Received: by 10.50.5.204 with SMTP id u12mr5164649igu.97.1358355740695; Wed, 16 Jan 2013 09:02:20 -0800 (PST) Received: by 10.64.35.10 with HTTP; Wed, 16 Jan 2013 09:02:20 -0800 (PST) In-Reply-To: References: <1899023892924472935@unknownmsgid> <8F12D028AC0A440081FAC86A2521A8F0@gmail.com> <9586818781574C62A4004A5FA69F118D@gmail.com> Date: Wed, 16 Jan 2013 09:02:20 -0800 Message-ID: Subject: Re: Fair Scheduler is not Fair why? From: Jeff Bean To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8f5029a4671bd204d36ad712 X-Gm-Message-State: ALoCoQn+8oV38uz0RMXTwMt0EDkc3vIIP9JqrJXaE2iTn6msIG1byCYyl5oPN7z5Fu++FCbCWr/p X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f5029a4671bd204d36ad712 Content-Type: text/plain; charset=ISO-8859-1 Validate your scheduler capacity and behavior by using sleep jobs. Submit sleep jobs to the pools that mirror your production jobs and just check that the scheduler pool allocation behaves as you expect. The nice thing about sleep is that you can mimic your real jobs: numbers of tasks and how long they run. You should be able to determine that the hypothesis posed on this thread is correct: that all the slots are taken by other tasks. Indeed, your UI says that research has 90 running tasks after having completed over 4000, but your emails says no tasks are scheduled. I'm a little confused. Jeff On Wed, Jan 16, 2013 at 8:50 AM, Nan Zhu wrote: > BTW, what I mentioned is fairsharepreemption not minimum share > > an alternative way to achieve that is to set minimum share of two queues > to be equal(or other allocation scheme you like), and sum of them is equal > to the capacity of the cluster, and enable minimumSharePreemption > > Good Luck! > > Best, > > -- > Nan Zhu > School of Computer Science, > McGill University > > > On Wednesday, 16 January, 2013 at 11:43 AM, Nan Zhu wrote: > > I think you should do that, so that when the allocation is inconsistent > with fair share, the tasks in the queue which occupies more beyond it's > fair share will be killed, and the available slots would be assigned to the > other one (assuming the weights of them are the same) > > Best, > > -- > Nan Zhu > School of Computer Science, > McGill University > > > On Wednesday, 16 January, 2013 at 11:32 AM, Dhanasekaran Anbalagan wrote: > > HI Nan, > > We have not enabled Fair Scheduler Preemption. > > -Dhanasekaran. > > Did I learn something today? If not, I wasted it. > > > On Wed, Jan 16, 2013 at 11:21 AM, Nan Zhu wrote: > > have you enabled task preemption? > > Best, > > -- > Nan Zhu > School of Computer Science, > McGill University > > > On Wednesday, 16 January, 2013 at 10:45 AM, Justin Workman wrote: > > Looks like weight for both pools is equal and all map slots are used. > Therefore I don't believe anyone has priority for the next slots. Try > setting research weight to 2. This should allow research to take slots as > tech released them. > > Sent from my iPhone > > On Jan 16, 2013, at 8:26 AM, Dhanasekaran Anbalagan > wrote: > > HI Guys > > We configured fair scheduler with cdh4, Fair scheduler not work properly. > Map Task Capacity = 1380 > Reduce Task Capacity = 720 > > We create two users tech and research, we configured equal weight 1 But, I > stared job in research user mapper will not allocated why? > please guide me guys. > > > > > 5 > 5 > 30 > 1.0 > > > 5 > 5 > 30 > 1.0 > > > > Note: we have tested with Hadoop Stream job. > > Fair Scheduler Administration Pools PoolRunning JobsMap TasksReduce TasksScheduling > Mode Min ShareMax ShareRunningFair ShareMin ShareMax ShareRunningFair > Share research15-90690.05-00.0FAIR tech35-1266690.05-2424.0FAIR default00- > 00.00-00.0FAIR Running Jobs SubmittedJobIDUserNamePoolPriorityMap TasksReduce > Tasks FinishedRunningFair ShareWeightFinishedRunningFair ShareWeight Jan > 16, 08:51 job_201301071639_2118 > tech streamjob5335328828469969152.jar 30466 / 53724583313.5 1.0 0 / 240 > 0.0 1.0 Jan 16, 09:56 job_201301071639_2147 > research streamjob8832181817213433660.jar 4175 / 958190690.0 1.0 0 / 240 > 0.0 1.0 Jan 16, 10:01 job_201301071639_2148 > tech streamjob8773848575543653055.jar 1842 / 15484620313.5 1.0 0 / 240 > 0.0 1.0 Jan 16, 10:08 job_201301071639_2155 > tech counterfactualsim-prod.eagle-EagleDepthSignalDisabled-prod.eagle 387 > / 4506363.0 1.0 0 / 242424.0 1.0 > > -- > > > > > > > > > --e89a8f5029a4671bd204d36ad712 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Validate your scheduler capacity and behavior by using sleep jobs. Submit s= leep jobs to the pools that mirror your production jobs and just check that= the scheduler pool allocation behaves as you expect. The nice thing about = sleep is that you can mimic your real jobs: numbers of tasks and how long t= hey run.

You should be able to determine that the hypothesis posed on this threa= d is correct: that all the slots are taken by other tasks. Indeed, your UI = says that research has 90 running tasks after having completed over 4000, b= ut your emails says no tasks are scheduled. I'm a little confused.=A0 <= br>
Jeff

On Wed, Jan 16, 2013 at 8:50 AM,= Nan Zhu <zhunansjtu@gmail.com> wrote:
BTW, what I mentioned is fairsharepreemption =A0not mi= nimum share=A0

an alternative way to achieve that = is to set minimum share of two queues to be equal(or other allocation schem= e you like), and sum of them is equal to the capacity of the cluster, and e= nable minimumSharePreemption

Good Luck!

Best,

--=A0
Nan Zhu
= School of Computer Science,
McGill University


=20

On Wednesday, 16 January, 2013 at 11:43 AM, Nan Zhu wrote:

I think you should do that, so that when the allocation= is inconsistent with fair share, the tasks in the queue which occupies mor= e beyond it's fair share will be killed, and the available slots would = be assigned to the other one (assuming the weights of them are the same)

Best,

--=A0
Nan Zhu
= School of Computer Science,
McGill University


=20

On Wednesday, 16 January, 2013 a= t 11:32 AM, Dhanasekaran Anbalagan wrote:

HI Nan,

=
We have not enabled=A0Fair Scheduler Preemption.

-Dhanasekaran.

Did I learn something today? If not, I wasted i= t.


On Wed, Jan 16, 2013 at 11:21 AM, Nan Zhu &l= t;zhunansjtu@gmai= l.com> wrote:
have you enabled task preemption?

Best,

--=A0
Nan Zhu
= School of Computer Science,
McGill University


=20

On Wednesday, 16 January, 2013 a= t 10:45 AM, Justin Workman wrote:

Looks like weight for both pools i= s equal and all map slots are used. Therefore I don't believe anyone ha= s priority for the next slots. Try setting research weight to 2. This shoul= d allow research to take slots as tech released them.=A0

Sent from my iPhone

On Jan 16, 2013, at 8:26 AM, Dhanasek= aran Anbalagan <= bugcy013@gmail.com> wrote:

HI Guys

We configured=A0fair scheduler with c= dh4, Fair scheduler not work properly.
Map Task Capacity =3D 1380=
Reduce Task Capacity =3D 720

We create two users tech and research, we configured eq= ual weight 1 But, I stared job in research user mapper will not allocated w= hy?=A0
please guide me guys.

<?xml version=3D"1.0"?>
<allocations><= /div>
<pool name=3D"tech">
=A0 <minMaps>5</minMaps>
=A0 <minReduces>5</minReduces>
=A0 <maxRunningJobs>30</maxRunningJobs= >
=A0 <weight>1.0</weight>
</pool>
<pool name=3D"research">
=A0 <minMaps>5<= /minMaps>
=A0 <minReduces>5</minReduces>
=A0 <maxRunningJobs>30</maxRunningJobs= >
=A0 <weight&= gt;1.0</weight>=A0
</pool>
</allocations>

= Note: we have tested with Hadoop Stream job.

F= air Scheduler Administration

Pools

Map Tasks= <= td>50
PoolRunning JobsReduce TasksScheduling Mode
Min ShareMax ShareRunningFair ShareMin ShareMax ShareRunningFair Share
research15-90690.05-00.0FAIR
tech35-1266690.0-2424.0FAIR
default00-00.0-00.0FAIR

Running Jobs

SubmittedJobIDUserNamePoolPriorityMap TasksReduce Tasks
FinishedRunningFair ShareWeightFini= shedRunningFair ShareWeight
Jan 16, 08:51 job_201301071639_2118tech streamjob5335328828469969152.jar 30466 / 53724583313.5 1.0 0 / 2400.0 1.0
Jan 16, 09:56 job_201301071639_2147research streamjob8832181817213433660.jar 4175 / 958190690.0 1.0 0 / 2400.0 1.0
Jan 16, 10:01 job_201301071639_2148tech streamjob8773848575543653055.jar 1842 / 15484620313.5 1.0 0 / 2400.0 1.0
Jan 16, 10:08 job_201301071639_2155tech counterfactualsim-prod.eagle-EagleDepthSignalDisabled-prod.eagle 387 / 4506363.0 1.0 0 / 242424.0 1.0

--
=A0
=A0
=A0
=20 =20 =20 =20


=20 =20 =20 =20

=20 =20 =20 =20
=20


--e89a8f5029a4671bd204d36ad712--