Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of sagarmehta@gmail.com designates
 209.85.212.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <B3735C6D-7BA0-4EE8-9C28-3B1FC6AACE9A@apache.org>
References: 
 <CAMq4vAG1U9pYqzn2384cx8Ke=hPnJN3BwQ72tUM74B43B0Jg9A@mail.gmail.com>
	<B3735C6D-7BA0-4EE8-9C28-3B1FC6AACE9A@apache.org>
Date: Fri, 26 Apr 2013 10:42:47 -0700
Message-ID: 
 <CAMq4vAH7O8kO06SionTrf28+cvvF0Q6poAxzGct2HoFV0u9Rwg@mail.gmail.com>
Subject: Re: Automatically mapping a job submitted by a particular user to a
 specific hadoop map-reduce queue
From: Sagar Mehta <sagarmehta@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=e89a8f3ba6e32b4c9304db4710a0

--e89a8f3ba6e32b4c9304db4710a0
Content-Type: text/plain; charset=ISO-8859-1

Hi Vinod,

Yes this is exactly what we are doing right now which works but is manual
and exposes the policy.
I think the JIRA than Sandy pointed out -
https://issues.apache.org/jira/browse/MAPREDUCE-5132 is a good first step
in that direction.

Cheers,
Sagar

On Thu, Apr 25, 2013 at 1:44 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> The 'standard' way to do this is using queu-acls to enforce a particular
> user to be able to submit jobs to a sub-set of queues and then let the user
> decide which of that subset of queues he wishes to submit a job to.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote:
>
> Hi Guys,
>
> We have a general purpose Hive cluster [about 200 nodes] which is used for
> various jobs like
>
>    - Production
>    - Experimental/Research
>    - Adhoc queries
>
> We are using the fair-share scheduler to schedule them and for this we
> have corresponding 3 pools in the scheduler.
>
> *Here is what we want.*
>
> *A hive query submitted by a user with user-name A should go to one of
> the pools above based on a pre-defined mapping. We are wondering where/how
> to specify this mapping?*
>
> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
> particular job run.*
>
> This puts the job on the map-reduce queue named "X" and the following
> configuration in the fair-share scheduler
>
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
>
> maps this to a pool named "X" in the fair-share scheduler.
>
> However we [while wearing our Hadoop developer/admin hat] don't want the
> user/analyst to specify that so as to enforce some cluster-use policy.
>
> Based on his/her username we want to automatically select which hadoop
> queue and subsequently which fair-share scheduler pool, his/her job should
> go to. I'm pretty sure this is a common use-case and wondering how to do
> this in Hadoop.
>
> Any help/insights/pointers would be greatly appreciated.
>
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>
>
>
>
>

--e89a8f3ba6e32b4c9304db4710a0
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Vinod,<div><br></div><div>Yes this is exactly what we are doing right no=
w which works but is manual and exposes the policy.</div><div>I think the J=
IRA than Sandy pointed out -=A0<a href=3D"https://issues.apache.org/jira/br=
owse/MAPREDUCE-5132">https://issues.apache.org/jira/browse/MAPREDUCE-5132</=
a>=A0is a good first step in that direction.</div>
<div><br></div><div>Cheers,</div><div>Sagar<br><br><div class=3D"gmail_quot=
e">On Thu, Apr 25, 2013 at 1:44 PM, Vinod Kumar Vavilapalli <span dir=3D"lt=
r">&lt;<a href=3D"mailto:vinodkv@hortonworks.com" target=3D"_blank">vinodkv=
@hortonworks.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word"><div>The=
 &#39;standard&#39; way to do this is using queu-acls to enforce a particul=
ar user to be able to submit jobs to a sub-set of queues and then let the u=
ser decide which of that subset of queues he wishes to submit a job to.</di=
v>
<br><div>
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;te=
xt-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norm=
al;border-collapse:separate;text-transform:none;font-size:medium;white-spac=
e:normal;font-family:Helvetica;word-spacing:0px"><div>
<div>Thanks,</div><div>+Vinod Kumar Vavilapalli</div><div>Hortonworks Inc.<=
br><a href=3D"http://hortonworks.com/" target=3D"_blank">http://hortonworks=
.com/</a></div></div></span>
</div><div><div class=3D"h5">

<br><div><div>On Apr 24, 2013, at 6:22 PM, Sagar Mehta wrote:</div><br><blo=
ckquote type=3D"cite">Hi Guys,<div><br></div><div>We have a general purpose=
 Hive cluster [about 200 nodes] which is used for various jobs like</div>
<div><ul><li>Production</li><li>Experimental/Research</li><li>Adhoc queries=
</li></ul><div>We are using the fair-share scheduler to schedule them and f=
or this we have corresponding 3 pools in the scheduler.</div>
</div><div><br></div><div><b>Here is what we want.</b></div><div><br></div>=
<div><b>A hive query submitted by a user with user-name A should go to one =
of the pools above based on a pre-defined mapping. We are wondering where/h=
ow to specify this mapping?</b></div>

<div><br></div><div><b>We can do this manually by adding -<a href=3D"http:/=
/Dmapred.job.queue.name/" target=3D"_blank">Dmapred.job.queue.name</a>=3D&q=
uot;X&quot; on a particular job run.</b></div><div><br></div><div>This puts=
 the job on the map-reduce queue named &quot;X&quot; and the following conf=
iguration in the fair-share scheduler</div>

<div><br></div><div><div>=A0 &lt;property&gt;</div><div>=A0 =A0 &lt;name&gt=
;mapred.fairscheduler.poolnameproperty&lt;/name&gt;</div><div>=A0 =A0 &lt;v=
alue&gt;<a href=3D"http://mapred.job.queue.name/" target=3D"_blank">mapred.=
job.queue.name</a>&lt;/value&gt;</div>

<div>=A0 &lt;/property&gt;</div></div><div><br></div><div>maps this to a po=
ol named &quot;X&quot; in the fair-share scheduler.</div><div><br></div><di=
v>However we [while wearing our Hadoop developer/admin hat] don&#39;t want =
the user/analyst to specify that so as to enforce some cluster-use policy.<=
/div>

<div><br></div><div>Based on his/her username we want to automatically sele=
ct which hadoop queue and subsequently which fair-share scheduler pool, his=
/her job should go to. I&#39;m pretty sure this is a common use-case and wo=
ndering how to do this in Hadoop.=A0</div>

<div><br></div><div>Any help/insights/pointers would be greatly appreciated=
.</div><div><br></div><div>Sagar</div><div>PS - Btw we are using Cloudera c=
dh3u2 and the user jobs are Hive queries.</div><div><br></div><div><br>

</div><div><br></div>
</blockquote></div><br></div></div></div></blockquote></div><br></div>

--e89a8f3ba6e32b4c9304db4710a0--