Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9B8AFF93F for ; Thu, 25 Apr 2013 06:42:20 +0000 (UTC) Received: (qmail 8435 invoked by uid 500); 25 Apr 2013 06:42:13 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 7237 invoked by uid 500); 25 Apr 2013 06:42:11 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 6936 invoked by uid 99); 25 Apr 2013 06:42:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Apr 2013 06:42:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sandy.ryza@cloudera.com designates 74.125.83.46 as permitted sender) Received: from [74.125.83.46] (HELO mail-ee0-f46.google.com) (74.125.83.46) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Apr 2013 06:42:05 +0000 Received: by mail-ee0-f46.google.com with SMTP id c13so1069437eek.19 for ; Wed, 24 Apr 2013 23:41:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=0VhLTsXtLHBpAGfyVP87xfu4dt+StMkc+zdr1XvNXwk=; b=pFMkg28go1ay3kJDBmAVn+qoFSkGjEOlBQZPksTYzWk7xVCbjJvWZRcbPclzkVkECS rxMh+0oAe4Y/p7j+rsEehHZqLu9roTbjtyiJ9P/sDTKZ0Eg6gXR+NFUTYi+lX7pHI/kC kkmDhEA4ZsQBlWqlYPlUHJB1crYsQ0yHBcLQjXhxrxQkKZ8gyjABZXU8Hjijxjhu+ofI y1UoU1B8xJCb+dbf9IbF+pGyUG5pw82+JSljl0hy3LD3jLyMfObMqIwUEcwP8X/EneV8 AILzO6u5klK7vxf+MzsjaYLvH4SeHpMhvtVnAkbZCa5rcOQBuLCypgQ4KNS8SQE0DJIT Sh5g== MIME-Version: 1.0 X-Received: by 10.15.43.73 with SMTP id w49mr72407677eev.12.1366872104554; Wed, 24 Apr 2013 23:41:44 -0700 (PDT) Received: by 10.15.82.196 with HTTP; Wed, 24 Apr 2013 23:41:44 -0700 (PDT) In-Reply-To: References: Date: Wed, 24 Apr 2013 23:41:44 -0700 Message-ID: Subject: Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue From: Sandy Ryza To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e016813103f026f04db29b6b8 X-Gm-Message-State: ALoCoQk912SrpVCNx1n9uMmjvUWVrhyq7XIaF4wZZKpSDqQRHqgKXGbHxAsmLJuMYM63jFTGurXB X-Virus-Checked: Checked by ClamAV on apache.org --089e016813103f026f04db29b6b8 Content-Type: text/plain; charset=ISO-8859-1 Hi Sagar, This capability currently does not exist in the fair scheduler (or other schedulers, as far as I know), but a JIRA has been filed recently that addresses a similar need. Would https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're trying to do? If not, would you mind filing a new JIRA for the functionality you'd want? -Sandy On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta wrote: > Hi Guys, > > We have a general purpose Hive cluster [about 200 nodes] which is used for > various jobs like > > - Production > - Experimental/Research > - Adhoc queries > > We are using the fair-share scheduler to schedule them and for this we > have corresponding 3 pools in the scheduler. > > *Here is what we want.* > > *A hive query submitted by a user with user-name A should go to one of > the pools above based on a pre-defined mapping. We are wondering where/how > to specify this mapping?* > > *We can do this manually by adding -Dmapred.job.queue.name="X" on a > particular job run.* > > This puts the job on the map-reduce queue named "X" and the following > configuration in the fair-share scheduler > > > mapred.fairscheduler.poolnameproperty > mapred.job.queue.name > > > maps this to a pool named "X" in the fair-share scheduler. > > However we [while wearing our Hadoop developer/admin hat] don't want the > user/analyst to specify that so as to enforce some cluster-use policy. > > Based on his/her username we want to automatically select which hadoop > queue and subsequently which fair-share scheduler pool, his/her job should > go to. I'm pretty sure this is a common use-case and wondering how to do > this in Hadoop. > > Any help/insights/pointers would be greatly appreciated. > > Sagar > PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries. > > > > --089e016813103f026f04db29b6b8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Sagar,

This capability current= ly does not exist in the fair scheduler (or other schedulers, as far as I k= now), but a JIRA has been filed recently that addresses a similar need. =A0= Would=A0h= ttps://issues.apache.org/jira/browse/MAPREDUCE-5132=A0work for what you= 're trying to do? =A0If not, would you mind filing a new JIRA for the f= unctionality you'd want?

-Sandy


On Wed, Apr 24, 2013 at 6:22 PM, Sagar= Mehta <sagarmehta@gmail.com> wrote:
Hi Guys,

We have a genera= l purpose Hive cluster [about 200 nodes] which is used for various jobs lik= e
  • Production
  • Experimental/Research
  • Adhoc queries=
We are using the fair-share scheduler to schedule them and f= or this we have corresponding 3 pools in the scheduler.

Here is what we want.

=
A hive query submitted by a user with user-name A should go to one = of the pools above based on a pre-defined mapping. We are wondering where/h= ow to specify this mapping?

We can do this manually by adding -Dmapred.job.queue.name=3D&qu= ot;X" on a particular job run.

This puts = the job on the map-reduce queue named "X" and the following confi= guration in the fair-share scheduler

=A0 <property>
=A0 =A0 <name>= ;mapred.fairscheduler.poolnameproperty</name>
=A0 =A0 <v= alue>mapred.j= ob.queue.name</value>
=A0 </property>

maps this to a po= ol named "X" in the fair-share scheduler.

However we [while wearing our Hadoop developer/admin hat] don't want = the user/analyst to specify that so as to enforce some cluster-use policy.<= /div>

Based on his/her username we want to automatically sele= ct which hadoop queue and subsequently which fair-share scheduler pool, his= /her job should go to. I'm pretty sure this is a common use-case and wo= ndering how to do this in Hadoop.=A0

Any help/insights/pointers would be greatly appreciated= .

S= agar
PS - Btw we are using Cloudera cdh3u2 and the = user jobs are Hive queries.




--089e016813103f026f04db29b6b8--