Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C22D1F849 for ; Fri, 26 Apr 2013 17:43:18 +0000 (UTC) Received: (qmail 68648 invoked by uid 500); 26 Apr 2013 17:43:13 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 68501 invoked by uid 500); 26 Apr 2013 17:43:13 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 68414 invoked by uid 99); 26 Apr 2013 17:43:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2013 17:43:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sagarmehta@gmail.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-wi0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2013 17:43:07 +0000 Received: by mail-wi0-f176.google.com with SMTP id hj19so895327wib.9 for ; Fri, 26 Apr 2013 10:42:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=Ga6NNOMrljHdvfn/Zaq1F5nH6wlAH65kENQ3NN2fmKY=; b=co65YXNF87/TE3U8tiCsslycfLYeghBG1aC8DTQItUnndDKvm5Z/osaGgOmhFEbrqu jbvOokBFE7okBZlAY8NO482EFrg1tPnhx4lbM9FHK9krkWMj3cS0NtXaEnkkLMjyB4Hj 66Lx65O3ezyvanw9AYQ8LkvFgM19wG6pxPY9VAgmYD7P17XmJIBNQCIeoOi8AUMwRjaE Ea9lLRucH2mQIJrYXqoZVp47TUtaXNZPMFkyWa7HOflWqL8I4mfm2D7XDtiCx2gl8RRP Tgu2HkFhkKzWVVkC6B2c8tncxDehIghP5VHmd02h8QbSaRUvzsdyiIYJaiJlUXIEJsDv eagA== MIME-Version: 1.0 X-Received: by 10.180.108.3 with SMTP id hg3mr5574572wib.17.1366998167274; Fri, 26 Apr 2013 10:42:47 -0700 (PDT) Received: by 10.194.17.8 with HTTP; Fri, 26 Apr 2013 10:42:47 -0700 (PDT) In-Reply-To: References: Date: Fri, 26 Apr 2013 10:42:47 -0700 Message-ID: Subject: Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue From: Sagar Mehta To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8f3ba6e32b4c9304db4710a0 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f3ba6e32b4c9304db4710a0 Content-Type: text/plain; charset=ISO-8859-1 Hi Vinod, Yes this is exactly what we are doing right now which works but is manual and exposes the policy. I think the JIRA than Sandy pointed out - https://issues.apache.org/jira/browse/MAPREDUCE-5132 is a good first step in that direction. Cheers, Sagar On Thu, Apr 25, 2013 at 1:44 PM, Vinod Kumar Vavilapalli < vinodkv@hortonworks.com> wrote: > The 'standard' way to do this is using queu-acls to enforce a particular > user to be able to submit jobs to a sub-set of queues and then let the user > decide which of that subset of queues he wishes to submit a job to. > > Thanks, > +Vinod Kumar Vavilapalli > Hortonworks Inc. > http://hortonworks.com/ > > On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote: > > Hi Guys, > > We have a general purpose Hive cluster [about 200 nodes] which is used for > various jobs like > > - Production > - Experimental/Research > - Adhoc queries > > We are using the fair-share scheduler to schedule them and for this we > have corresponding 3 pools in the scheduler. > > *Here is what we want.* > > *A hive query submitted by a user with user-name A should go to one of > the pools above based on a pre-defined mapping. We are wondering where/how > to specify this mapping?* > > *We can do this manually by adding -Dmapred.job.queue.name="X" on a > particular job run.* > > This puts the job on the map-reduce queue named "X" and the following > configuration in the fair-share scheduler > > > mapred.fairscheduler.poolnameproperty > mapred.job.queue.name > > > maps this to a pool named "X" in the fair-share scheduler. > > However we [while wearing our Hadoop developer/admin hat] don't want the > user/analyst to specify that so as to enforce some cluster-use policy. > > Based on his/her username we want to automatically select which hadoop > queue and subsequently which fair-share scheduler pool, his/her job should > go to. I'm pretty sure this is a common use-case and wondering how to do > this in Hadoop. > > Any help/insights/pointers would be greatly appreciated. > > Sagar > PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries. > > > > > --e89a8f3ba6e32b4c9304db4710a0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Vinod,

Yes this is exactly what we are doing right no= w which works but is manual and exposes the policy.

Cheers,
Sagar

On Thu, Apr 25, 2013 at 1:44 PM, Vinod Kumar Vavilapalli <vinodkv= @hortonworks.com> wrote:
The= 'standard' way to do this is using queu-acls to enforce a particul= ar user to be able to submit jobs to a sub-set of queues and then let the u= ser decide which of that subset of queues he wishes to submit a job to.
Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.<= br>http://hortonworks= .com/

On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote:

Hi Guys,

We have a general purpose= Hive cluster [about 200 nodes] which is used for various jobs like
  • Production
  • Experimental/Research
  • Adhoc queries=
We are using the fair-share scheduler to schedule them and f= or this we have corresponding 3 pools in the scheduler.

Here is what we want.

=
A hive query submitted by a user with user-name A should go to one = of the pools above based on a pre-defined mapping. We are wondering where/h= ow to specify this mapping?

We can do this manually by adding -Dmapred.job.queue.name=3D&q= uot;X" on a particular job run.

This puts= the job on the map-reduce queue named "X" and the following conf= iguration in the fair-share scheduler

=A0 <property>
=A0 =A0 <name>= ;mapred.fairscheduler.poolnameproperty</name>
=A0 =A0 <v= alue>mapred.= job.queue.name</value>
=A0 </property>

maps this to a po= ol named "X" in the fair-share scheduler.

However we [while wearing our Hadoop developer/admin hat] don't want = the user/analyst to specify that so as to enforce some cluster-use policy.<= /div>

Based on his/her username we want to automatically sele= ct which hadoop queue and subsequently which fair-share scheduler pool, his= /her job should go to. I'm pretty sure this is a common use-case and wo= ndering how to do this in Hadoop.=A0

Any help/insights/pointers would be greatly appreciated= .

Sagar
PS - Btw we are using Cloudera c= dh3u2 and the user jobs are Hive queries.





--e89a8f3ba6e32b4c9304db4710a0--